Timeouts using certbot despite 200 OK responses

Good afternoon. I'm seeing the "timeout during connect" error, even when the nginx logs show the requests coming through with 200 response code.

192.168.7.131 - - [28/Sep/2021:16:02:59 -0500] "GET /.well-known/acme-challenge/5IIJtrS_wXk3gvExvbR1figrzLczY_6p0vx4I0fyRGc HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"
192.168.7.131 - - [28/Sep/2021:16:02:59 -0500] "GET /.well-known/acme-challenge/a_PS8zY7YXTG2yYo7qLL1Ep-_HJRuyi-_mncx75c8qI HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"
192.168.7.131 - - [28/Sep/2021:16:02:59 -0500] "GET /.well-known/acme-challenge/51QyxMK0pKrQwTnmQJ1KTLohuTLGq6A6pfTe5nhdPVU HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"
192.168.7.131 - - [28/Sep/2021:16:02:59 -0500] "GET /.well-known/acme-challenge/NbrDuL4sZnNzBDk5ts4bYhGjKLh_5uBSeQL-lwpy-Ek HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"
192.168.7.131 - - [28/Sep/2021:16:03:00 -0500] "GET /.well-known/acme-challenge/YhurVpbbyBm83ZuWzyaAn3gBPsxs1hNjgAPx7rK2fB0 HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)" "-"

I've validated that when I start up the nginx container, I can access these URLs externally using online services. I struggled with these errors intermittently last week into the weekend. It finally worked, and then today I tried to add another domain into the mix and it's happening again. Flipped over to the staging server because I re-ran things enough to get throttled. It's happening on both, staging and prod. Everything seems fine with external access (firewalls, routing, etc.) - just not sure what's going on exactly.

Note - sometimes I've observed nginx logs not showing successful requests for all 5 requests, sometimes it's only a few of the full set, the others don't come through at all. Sometimes the error from acme is also not "timeout" it's "connection reset by peer" ... it's not super consistent, tho (some of these were observed on the prod side, not the staging side, too).

My domain is: cusack-ruth.name auth.cusack-ruth.name weather.cusack-ruth.name automate.cusack-ruth.name frigate.cusack-ruth.name

I ran this command (via certbot Docker): Arguments: ['--webroot', '-w', '/var/www/certbot', '--staging', '--email', 'bdruth@gmail.com', '-d', 'cusack-ruth.name', '-d', 'auth.cusack-ruth.name', '-d', 'weather.cusack-ruth.name', '-d', 'automate.cusack-ruth.name', '-d', 'frigate.cusack-ruth.name', '--rsa-key-size', '4096', '--agree-tos', '--force-renewal']

It produced this output: During secondary validation: Fetching Timeout during connect (likely firewall problem)

where is each of the 5 verification URLs

My web server is (include version): nginx (via docker, latest)

The operating system my web server runs on is (include version): raspbian (buster)

My hosting provider, if applicable, is: n/a

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): latest via docker (1.19.0)

You should get 4 different HTTP requests per challenge URL, as described in ACME v1/v2: Validating challenges from multiple network vantage points.

In your case, you should see a total of (5 requests * 4 vantage points =) 20 requests.

If the error leads with "During secondary validation", then the likely cause is that the cloud-based vantage points were unable to connect to your server. These (currently) come from AWS. Worth double checking that you're not blocking any AWS ranges.

If the problem is intermittent, I can point to some previous instances where unhelpful "anti-DoS" home router features have caused these kinds of symptoms. The requests tend to arrive all at once, which may trigger firewalls that do throttling.

4 Likes

Hmm, I'll check and see if there's something along those lines going on. Nothing comes to mind, but at least that's something tangible to investigate.

Bam, that did it. I had some filters on (I thought, only) outbound connections to some AWS ASNs to prefer one WAN connection over another, and this was causing the issue. Goodness gracious. Thank you!

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.