"Timeout during connect" on renewal

My domain is: mirror.anarc.at

I ran this command: certbot renew

It produced this output:

 - The following errors were reported by the server:

   Domain: mirror.anarc.at
   Type:   connection
   Detail: Fetching
   http://mirror.anarc.at/.well-known/acme-challenge/5_CpOCx38guwL_I9Gd1x5VaNopxi_rCUwobYJehbsFg:
   Timeout during connect (likely firewall problem)

I have also tried:

root@marcos:/etc# certbot certonly -d mirror.anarc.at --webroot  -w /var/www/mirror/
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Cert is due for renewal, auto-renewing...
Renewing an existing certificate for mirror.anarc.at
Performing the following challenges:
http-01 challenge for mirror.anarc.at
Using the webroot path /var/www/mirror for all unmatched domains.
Waiting for verification...
Challenge failed for domain mirror.anarc.at
http-01 challenge for mirror.anarc.at
Cleaning up challenges
Some challenges have failed.

IMPORTANT NOTES:
 - The following errors were reported by the server:

   Domain: mirror.anarc.at
   Type:   connection
   Detail: Fetching
   http://mirror.anarc.at/.well-known/acme-challenge/B1k-K3ozfrJ5hQx7d5lLhLzUD3C8w1jNxrnRGZIxSLY:
   Timeout during connect (likely firewall problem)

   To fix these errors, please make sure that your domain name was
   entered correctly and the DNS A/AAAA record(s) for that domain
   contain(s) the right IP address. Additionally, please check that
   your computer has a publicly routable IP address and that no
   firewalls are preventing the server from communicating with the
   client. If you're using the webroot plugin, you should also verify
   that you are serving files from the webroot path you provided.

And I can reproduce the issue on Let's Debug (although debugging shows the webserver is reachable, see Let's Debug), but nowhere else: I have tried from people.debian.org and other machines and they can all reach my server correctly. I am not firewalling port 80 or 443, although I am redirecting the former to the latter, which never caused problems in the past.

My web server is (include version): 2.4.52-1~deb11u2

The operating system my web server runs on is (include version): Debian 11 "bullseye"

My hosting provider, if applicable, is: N/A

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): 1.12.0


I should also point out that I've been tracking my /etc directory in git for a while, with daily autocommit. Since around November, I started noticing an accumulation of CSRs in /etc/letsencrypt/csr, where I now have a whopping 1638 entries. Typically, I'd get new CSRs added there when new certificates are renewed, but because renewals have started breaking at some point in the past, I'm now adding dozens of CSRs a day in there. It seems that, around March 3rd, the Let's Encrypt servers have started having trouble reaching my server and I cannot clearly explain why this is happening anymore.

I've been using Let's Encrypt since at least 2017, and this is the first time I need to ask for help around a problem like this. Typically, the configuration issues are on my end: some expired DNS entry or misconfigured virtual host. But this is different: nothing changed on my end, and things seem to be working from other point of views on the network. I also tried to check the challenge by using --debug-challenges and curling the URL to see if the challenge is really available, and it is (with curl -L).

Is there something obvious I'm missing here? I looked at other issues like HTTP challenge fails: Timeout during connect, DNS problem - #5 by JuergenAuer and I don't seem to find anything that matches my experience there.

Thanks for any input.

1 Like

Hi @anarcat,

Is it possible that your hosting provider or ISP is blocking certain address ranges?

It does seem that most parts of the Internet can reach your server, but the Let's Encrypt validation consistently can't, which might suggest a hidden firewall rule or something.

4 Likes

Hi @schoen!

It's possible, I guess i'll escalate with them now, but this is highly unusual: typically they don't block anything like this. I'll ask.

Could someone at LE get me a traceroute to see where the traffic gets drop? Is there a looking glass or something?

Also: I guess that a DNS-01 challenge could workaround such a problem, no?

1 Like

It's rare, you would usually have to attract the attention of a busy LE staffer. :slight_smile: The LE staffer might first ask you to check with your ISP, as you're trying.

That could be very useful in some cases, but it doesn't exist now and I don't know if there would be concerns about attack surface or resources in hosting it on the same infrastructure as the actual validation.

Yes, it's pretty rare for DNS servers to refuse queries from anywhere.

4 Likes

Oh dear, my ISP seems to have fallen down the trap of "our call is important to us" and redirect me to a "community support forum", where I posted this link:

https://community.teksavvy.com/discussion/1498/timeouts-connecting-inbound-from-lets-encrypt/p1?new=1

What a nightmare... I suspect this will get about zero traction and will start looking at alternatives (ISP, and DNS-01 of course).

3 Likes

i actually got on the phone (remember those?) with them and they claim to not be blocking any port or any traffic on their side, so this would be some bizarre routing issue of some sort, which brings me back to needing some sort of traceroute from Let's encrypt to diagnose this further. Sigh.

1 Like

When I looked yesterday, Let's Encrypt was able to connect to other hosts in your /24 (e.g. to 206.248.172.208).

Based on that, it does seem most likely to me that you have some firewall rule on your machine which is causing this:

  • Although your ISP could be blocking port 80 for your specific IP, I don't know why they would single you out. Unless there is some plan-based differentiation?
  • Asking for a traceroute from Let's Encrypt probably won't be too helpful because AFAIU routing tables do not have anything smaller than /24.

Is this Debian server behind a NAT, or plugged directly in to your internet connection? Have you tried rebooting your router/modem, if any?

3 Likes

It's behind NAT with port forwarding for that host, and I must admit I haven't tried rebooting the router, this is the nex step.

1 Like

well this is embarrasing, rebooting the router actually fixed this issue.

my hunch was this was IPv6 related, and I think it was spot on in the end. this IPv6 setup has a tendency of falling apart upstream after a while. reconnecting the IPv6 interface (or rebooting) typically fixes it, so my guess is that Let's Encrypt was trying to connect over IPv6 and failing to fallback.

is that a known issue? nothing confirmed on my end of course, just a hunch. in fact, i still can't quite reliably renew the domains... some did go through, but some still get timeouts...

sorry for the noise, and thanks for the stellar support.

2 Likes

Hooray for shoddy CPE!

I don't think IPv6 was involved here. If we look at the authorization details (https://acme-staging-v02.api.letsencrypt.org/get/authz-v3/1982674078), we can see that Let's Encrypt only resolved an IPv4 address for your domain, and failed to connect on that address.

We've also previously seen "helpful" router features which throttle traffic as a sort of anti-DoS feature. Let's Encrypt challenge requests tend to arrive all at once, which can trigger a blocking response.

7 Likes

The thing is: that's not supposed to be some shoddy random SOHO router. This is the turris omnia, with a real Linux (OpenWRT) under it. It should be able to handle, you know, port forwarding. :stuck_out_tongue:

That is really, truly bizarre then. I really don't understand what the heck happened here...

If this was a situation where the site was just down, I'd get it, but the traffic was blanking out only for let's encrypt! I didn't find another site that had those timeout issues. And even now that most (but not all!) of my certs have renewed, there's still one more left that's stumbling upon itself:

   Domain: analytics.anarc.at
   Type:   connection
   Detail: During secondary validation: Fetching
   http://analytics.anarc.at/.well-known/acme-challenge/VXVDzedJ_MTPxKKaA8K-c7u8coBdlIOkItkROTqLeXE:
   Timeout during connect (likely firewall problem)

Notice how it's failing in the secondary validation now. Also note that all those DNS records are basically CNAMEs to the same host right now, so if they were able to route for some, it should route for all, no?

That could actually be something the Turris does. And lo and behold, there's this "Dynamic firewall" thing that I seem to have naively enabled in there. Disabling it seems to make that final cert just go through fine.

Sigh, thanks, and sorry again for the noise, I feel quite silly.

5 Likes

Interesting! One of the Let's Encrypt validation servers is listed on today's Turris greylist as an "http" attacker.

8 Likes

oh wow, that sounds like something we (who?) should fix.

1 Like

I reported the problem upstream here:

5 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.