Can't renew with certbot or Crypt::LE, Timeout during secondary validation (403 resolved)

Asked and answered.
OR
Answered then asked?

How can you be sure the perspective changes were not an issue?

3 Likes

Do you see any challenge requests in your Apache access log?

You should be seeing 4 or 5 with successful challenge (as of today). Yours failed so we won't see that many but would be interesting to know the requester IP addresses of the ones you do see.

You could use Let's Debug to generate the test from staging if that's easier.

4 Likes

Hmm. Unanticipated issue: because I used DNS validation to create a fresh certificate for our primary domain (www.galionlibrary.org) before it expired, further attempts at testing are now short-circuited with "Too early for renewal, certificate expires in 89 days."

Which makes sense, but at some point I would like to try for getting all the domains supported, preferably while not facing an imminent deadline. Surely there's a way around this?

Update: the "too early" error happens even if the domain covered by the experiment is not included in the new request. Hmm. I might be partly answering my own question here: if I configure Apache to only send the cert when the request is for the correct domain... hang on, let me see if I can do that...

Update: Yeah, separating that domain to its own VirtualHost lets me make further attempts. Not the configuration I want long term, but good enough for debugging purposes.

1 Like

I suspect the perspective changes probably are an issue, in one way or another.

I just haven't worked out the details of exactly how, yet.

(Also, the "before" in the context of my statement, meant recently, after the perspective changes had already happened, but during the window of time when I had allowed that new firewall to sit between our web server and the public internet. So immediately before, not long before.)

1 Like

grep well-known /path/to/access.log says this:
[snip various indexers hitting it between tests... here is the relevant section from the most recent attempt... summary at end]

52.38.42.244 - - [24/Apr/2024:13:59:02 -0400] "GET /.well-known/acme-challenge/9qt3V9L0-q1q0L6fdEmihjTia_Bhx8FvC7ZwJLyD_4I HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.48.85.142 - - [24/Apr/2024:13:59:02 -0400] "GET /.well-known/acme-challenge/9qt3V9L0-q1q0L6fdEmihjTia_Bhx8FvC7ZwJLyD_4I HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
54.169.71.235 - - [24/Apr/2024:13:59:03 -0400] "GET /.well-known/acme-challenge/9qt3V9L0-q1q0L6fdEmihjTia_Bhx8FvC7ZwJLyD_4I HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
3.145.56.236 - - [24/Apr/2024:13:59:12 -0400] "GET /.well-known/acme-challenge/9qt3V9L0-q1q0L6fdEmihjTia_Bhx8FvC7ZwJLyD_4I HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.48.85.142 - - [24/Apr/2024:13:59:13 -0400] "GET /.well-known/acme-challenge/PNY3s-t_xhIDGPtybVfPlBRS-7WxwEhQV9B8kfdKjco HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
54.169.71.235 - - [24/Apr/2024:13:59:13 -0400] "GET /.well-known/acme-challenge/PNY3s-t_xhIDGPtybVfPlBRS-7WxwEhQV9B8kfdKjco HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
3.145.56.236 - - [24/Apr/2024:13:59:23 -0400] "GET /.well-known/acme-challenge/PNY3s-t_xhIDGPtybVfPlBRS-7WxwEhQV9B8kfdKjco HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.48.85.142 - - [24/Apr/2024:13:59:24 -0400] "GET /.well-known/acme-challenge/kYvlCqseJtc1Nx3MYgU-J8zbZS2hn0yErATb8bOHPJk HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
54.169.71.235 - - [24/Apr/2024:13:59:24 -0400] "GET /.well-known/acme-challenge/kYvlCqseJtc1Nx3MYgU-J8zbZS2hn0yErATb8bOHPJk HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
3.145.56.236 - - [24/Apr/2024:13:59:33 -0400] "GET /.well-known/acme-challenge/kYvlCqseJtc1Nx3MYgU-J8zbZS2hn0yErATb8bOHPJk HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
52.38.42.244 - - [24/Apr/2024:13:59:34 -0400] "GET /.well-known/acme-challenge/kYvlCqseJtc1Nx3MYgU-J8zbZS2hn0yErATb8bOHPJk HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.48.85.142 - - [24/Apr/2024:13:59:35 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
54.169.71.235 - - [24/Apr/2024:13:59:35 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
3.145.56.236 - - [24/Apr/2024:13:59:44 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
52.38.42.244 - - [24/Apr/2024:13:59:44 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.48.85.142 - - [24/Apr/2024:13:59:45 -0400] "GET /.well-known/acme-challenge/g1XVVe202cNR5vtoTKLRm0uRZJlfnGwj-Q-MM-gqNys HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
54.169.71.235 - - [24/Apr/2024:13:59:46 -0400] "GET /.well-known/acme-challenge/g1XVVe202cNR5vtoTKLRm0uRZJlfnGwj-Q-MM-gqNys HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
3.145.56.236 - - [24/Apr/2024:13:59:55 -0400] "GET /.well-known/acme-challenge/g1XVVe202cNR5vtoTKLRm0uRZJlfnGwj-Q-MM-gqNys HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

So in summary, I got four hits for challenge 9qt..., three hits for PNY..., four for kYvl..., four for y13y..., and three for g1XV...
Three IP addresses hit all five challenges, and one more IP address only hit three of them.

That's interesting. I focused on just a recent one. There were a lot of junk requests you can identify from the odd user-agent string (Apple Newsbot and the like).

But, the one below was legit and we see 4 successful requests. The only one missing is from the Primary Let's Encrypt data center. The Primary must succeed along with at least 3 Secondaries. Again, this is today's behavior subject to change any time.

I'm a little vague on the implications but I believe the primary uses Cloudflare Magic Transit and the center itself is in the USA. The secondary sites are all AWS but the primary is Flexential if I recall correctly. I doubt this helps I'm just giving an outline so you can focus your efforts.

I don't recall any recent changes to the primary center that could contribute to this.

This was helpful but do you have access to your firewall or edge network devices. Would be good to know if the primary request is reaching your network. I suspect it is as otherwise a general problem in the primary center would be sending up major alerts to staff.

But, it might be a routing issue between the primary and you. Hoping not.

13.48.85.142 - - [24/Apr/2024:13:59:35 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
54.169.71.235 - - [24/Apr/2024:13:59:35 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
3.145.56.236 - - [24/Apr/2024:13:59:44 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
52.38.42.244 - - [24/Apr/2024:13:59:44 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 308 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

PS: You should be able to reproduce this primary center timeout using https://letsdebug.net. Enter any domain name and then look at Verbose output. You should see an expected 404 response to its first test check. Then you'll see the same timeout error you get from your ACME Client.

4 Likes

Yeah, LetsDebug does indeed show the failure. It's a little lean on details, but it got that part right.

I don't have access to the Juniper router or anything upstream of it, no. That's all administered by our connectivity provider (the Ohio Public Library Information Network) and is presumably using the same setup for all public libraries in the state. I do know that the Juniper router is sitting directly on a dedicated fiber ethernet line that goes to a state government facility in Columbus. They have at least two different upstream providers, one of which is oarnet, and off the top of my head I don't recall the other. Those providers are as close to backbone as it gets in the state of Ohio.

Are there logs in the Juniper router?

And are you sure there is no IP based firewall in your server? Or even some other selective firewall?

If Juniper sees requests from more IP sources than your server then there must be something hiding in your server. This still "feels" like the most likely reason.

If Juniper doesn't see it that's looking like a comms routing issue

3 Likes

No idea. It's owned and administered by the OPLIN. I have physical access to it, but I have no login information, and abusing my physical access to gain additional access would probably be illegal.

Also, I have no particular reason to suspect that its configuration has changed since it was installed, (which was years ago, well before COVID), other than the generic "a thing is not working, I want to blame this object because I don't control it". I don't know who I would even contact to request further info.

root@cogitation:~/letsencrypt# iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT

Wed_2024-Apr-24_14:50
root@cogitation:~/letsencrypt#

200 is good
308 seems a bit large

This is more like what is expected:

It would be nice to see what those 308 bytes contain.

3 Likes

Hmm. The request as shown two fields earlier looks like eighty-some characters to me. In the log, it's ASCII (confirmed via hexl-find-file in Emacs: one byte per character), but I don't know http well enough to know whether it's possible that another character set was involved in the request. Though the usual suspects (UTF-8, Latin-1) would still be one-byte-per-character here. I don't know enough about Apache's log format to be certain the logged request is the entire request; it seems weird to me that it wouldn't be, but yeah, it does say 308 there, so... I think I need to do some research now.

Update: I think I may have misunderstood the meaning of the size field. Is it the size of the request or of the response send back?

You could maybe also run your ACME client in a more debugging friendly mode [if one exists for it].
Where you could pause the script at the point right after when it places the token file in the expected challenge location.
Then you could try that same URL from any system on the Internet and see those 308 bytes first hand.

3 Likes

The challenge files in /var/www/html/.well-known/acme-challenge/ ?
I have it leaving those in place. The ability to do stuff like that for debugging purposes is one of the main reasons I switch to Crypt::LE (because I speak Perl way way better than Python). ls -l says those files are 87 bytes each. If I wget them to another system, they're still 87 bytes each.

1 Like

jonadab@cinnamon:~$ wget -S http://cgi.galion.lib.oh.us/.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A
--2024-04-24 15:16:19-- http://cgi.galion.lib.oh.us/.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A
Resolving cgi.galion.lib.oh.us (cgi.galion.lib.oh.us)... 10.0.55.10
Connecting to cgi.galion.lib.oh.us (cgi.galion.lib.oh.us)|10.0.55.10|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 24 Apr 2024 19:16:19 GMT
Server: Apache/2.4.59 (Debian)
Last-Modified: Wed, 24 Apr 2024 17:58:02 GMT
ETag: "57-616db69675ce0"
Accept-Ranges: bytes
Content-Length: 87
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Length: 87
Saving to: ‘y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A.4’

y13yBugLjpX9LrHjJA7AwI3RHx 100%[=======================================>] 87 --.-KB/s in 0s

2024-04-24 15:16:19 (10.5 MB/s) - ‘y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A.4’ saved [87/87]

jonadab@cinnamon:~$
j

1 Like

Did you try from the Internet?
What does the web server log show for your wgets? [200 87?]

3 Likes

This is from my house, on consumer cablemodem internet:
jonadab@hippo:~$ wget -S http://cgi.galion.lib.oh.us/.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A
--2024-04-24 15:18:22-- http://cgi.galion.lib.oh.us/.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A
Resolving cgi.galion.lib.oh.us (cgi.galion.lib.oh.us)... 66.213.116.5
Connecting to cgi.galion.lib.oh.us (cgi.galion.lib.oh.us)|66.213.116.5|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 24 Apr 2024 19:18:22 GMT
Server: Apache/2.4.59 (Debian)
Last-Modified: Wed, 24 Apr 2024 17:58:02 GMT
ETag: "57-616db69675ce0"
Accept-Ranges: bytes
Content-Length: 87
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Length: 87
Saving to: ‘y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A.1’

y13yBugLjpX9LrHjJA7AwI3RHxPgi 100%[=================================================>] 87 --.-KB/s in 0s

2024-04-24 15:18:22 (25.1 MB/s) - ‘y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A.1’ saved [87/87]

1 Like

I see similar replies.
Are you sure that is where the ACME challenge files are actually being written?

3 Likes

10.0.0.15 - - [24/Apr/2024:15:16:19 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 345 "-" "Wget/1.21"
174.105.114.155 - - [24/Apr/2024:15:18:07 -0400] "GET /.well-known/acme-challenge/y13yBugLjpX9LrHjJA7AwI3RHxPgiUOhMeNpgAxUw0A HTTP/1.1" 200 345 "-" "Wget/1.18 (linux-gnu)"

1 Like

Relatively sure. I have --path /var/www/html/.well-known/acme-challenge in the le.pl command line, and when I run it, the files do appear in that location, with a last-modification time that closely matches the timestamps in the le output stating that the files are written.

Also, if the files were being written incorrectly, or in the wrong place, I would expect the validation to fail from every perspective, not only from certain ones.

2 Likes

It does seem curious to me that some of the challenges worked from four perspectives, and others from only three. I may run a test again to see if that's consistent or variable.

Update: This time four of the five tests got hits from four IP addresses, and only one of the five got hits from only three IP addresses. As before, it was the same four IP addresses each time. In fact, it appears to have been the same four IP addresses as the previous test. Is four perspectives enough to pass? If so, I might be able to get this to magically work by trying enough times. Though I would still be deeply curious about why that would be necessary.

Update: No, four perspectives is not enough to pass. This time I got four perspectives for each of the tests, and the whole thing still failed. In fact, it says all verifications failed. This suggests to me that there may be at least one perspective that is consistently failing, and I haven't seen its IP address in these most recent tests.