Connect to LE acme-v02 fails but acme-staging-v02 works

Same problem here and please don't ask me to check with the ISP as I am the ISP :smile:

Wanted to configure 2 DNS servers to offer DOH and DOT which require certificates.
Found that 1 server can communicate with LetsEncrypt while the second can't.
The IP addresses, on the same /29 prefix were being used for those DNS servers since ever and still are...

Here's the output of curl and traceroute from the problematic server:

curl -v https://acme-v02.api.letsencrypt.org

*   Trying 172.65.32.248:443...
*   Trying 2606:4700:60:0:f53d:5624:85c7:3a2c:443...
* Immediate connect fail for 2606:4700:60:0:f53d:5624:85c7:3a2c: Network is unreachable
^C

curl -v https://172.65.32.248

*   Trying 172.65.32.248:443...
^C

traceroute -T -p 443 acme-v02.api.letsencrypt.org

traceroute to acme-v02.api.letsencrypt.org (172.65.32.248), 30 hops max, 60 byte packets
 1  dsl-213-204-78-76.terra.net.lb (213.204.78.76)  0.918 ms  0.979 ms  1.070 ms
 2  172.19.1.1 (172.19.1.1)  2.637 ms  2.779 ms 172.19.2.2 (172.19.2.2)  2.660 ms
 3  * * *
 4  * * *
 5  185.91.99.26 (185.91.99.26)  43.477 ms  43.389 ms  43.627 ms
 6  * * *
.
.
.
29  * * *
30  * * *

Now for the staging tests all from the same server
curl -v https://acme-staging-v02.api.letsencrypt.org

*   Trying 172.65.46.172:443...
* Connected to acme-staging-v02.api.letsencrypt.org (172.65.46.172) port 443 (#0)
The rest of the output is omitted

traceroute -T -p 443 acme-staging-v02.api.letsencrypt.org

traceroute to acme-staging-v02.api.letsencrypt.org (172.65.46.172), 30 hops max, 60 byte packets
 1  dsl-213-204-78-76.terra.net.lb (213.204.78.76)  0.797 ms  0.929 ms  2.279 ms
 2  172.19.1.1 (172.19.1.1)  2.733 ms 172.19.2.2 (172.19.2.2)  2.979 ms 172.19.1.1 (172.19.1.1)  2.977 ms
 3  * * *
 4  * * *
 5  185.91.99.26 (185.91.99.26)  12.029 ms  12.184 ms  12.174 ms
 6  172.65.46.172 (172.65.46.172)  3.187 ms  3.238 ms  3.369 ms
1 Like

Welcome @edgard

I moved your post to its own thread. We prefer handling each person's problem independently. And, even though your symptom is the same as that other thread the underlying cause is likely different. Perhaps more important is the debug tools / steps available to you as the ISP will vary.

That said, these kinds of problems are difficult to debug. And, sometimes they resolve on their own in a day or two as the underlying routing problem gets "magically" fixed. Perhaps a backbone carrier identifies and resolves a routing problem or equipment failure.

It may seem that the LE API server is the common item but realize there are roughly 5 million certs issued per day so a serious failure "near that" would result in numerous outages. We are not seeing that.

The LE API servers sit "behind" the Cloudflare network. It is possible the connection problem is between your network and that. Perhaps even in Sodotel? (they look to own the IP of the last good hop).

Have you been able to use the LE API previous to this?

What do these show?

curl -4 https://www.cloudflare.com/cdn-cgi/trace
curl -6 https://www.cloudflare.com/cdn-cgi/trace
6 Likes

--- Update ---

Since we have multiple data centers in the country, they are connected together via private inter-city links, each is peering with the internet and with a Local Internet Exchange for local traffic.

CloudFlare's presence here is hosted at the LIE datacenter.

During the weekend, the problem escalated to a major level because everything hosted by CloudFlare is no more reachable from that data center.

It turned out to be, for some reason, they did not like the traffic of the affected data center reaching to their network directly. The moment we forced the traffic destined to their network to go through the internet everything worked like a charm.

A ticket is still open with them...

Thank you Mike

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.