Letsencrypt fails to get the response (solved, it was nginx not restarting properly)

Okay.
So can we blame letsencrypt infrastructure for that?

I do not know, it could be global Internet wide semi-intermittent problem.

2 Likes

We're trying to get a cert for megumin.ninamori.org, right?

The challenge URI contents clearly noted a "403" HTTP status returned when requesting the challenge token, so nginx must have responded with a 403 Forbidden HTTP status. This must show in nginx logs.

Debugging this should be done from nginx side (to start with). I'm not familiar with acme-nginx tho, but access must be logged somewhere, right?

6 Likes

"error": {
"type": "urn:ietf:params:acme:error:unauthorized",
"detail": "163.172.189.79: Invalid response from http://megumin.ninamori.org/.well-known/acme-challenge/RUza89uyOihewga4LMt3XvfuJuPDrtBbzX07ehXGklI: 404",
"status": 403
},

No, the acme flow got the 403 but the "detail" result shows the 404.

But, I agree this should be debugged from the nginx side. I can't help right now but just wanted to clarify this much

4 Likes

Ah, that's a different challenge. I responded to the challenge URI at the top of OP. That one shows a 403 response (and a 403 status, but that's a separate thing indeed).

5 Likes

Nope, I've turned these logs down for this.
But I can turn em on and try again, just give me a couple of minutes.

1 Like

403 response.

400 response.

404 response.

So already 3 different responses... Weird!

5 Likes

Staging, didn't work:

17.58.91.220 - - [21/Oct/2022:16:51:35 +0000] "GET /.well-known/acme-challenge/RUza89uyOihewga4LMt3XvfuJuPDrtBbzX07ehXGklI: HTTP/1.1" 404 146 "-" "AppleNewsBot"
17.58.88.151 - - [21/Oct/2022:16:52:13 +0000] "GET /.well-known/acme-challenge/RUza89uyOihewga4LMt3XvfuJuPDrtBbzX07ehXGklI: HTTP/1.1" 404 146 "-" "AppleNewsBot"
17.58.170.200 - - [21/Oct/2022:16:52:34 +0000] "GET /.well-known/acme-challenge/RUza89uyOihewga4LMt3XvfuJuPDrtBbzX07ehXGklI: HTTP/1.1" 404 146 "-" "AppleNewsBot"
[REDACTED, that was me] - - [21/Oct/2022:16:53:10 +0000] "GET /.well-known/acme-challenge/TCLILynArB58fe2Im4JGxIy96o9sF9DzB6QgnNjntF4 HTTP/1.1" 200 87 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"

Non-staging:

https://acme-v02.api.letsencrypt.org/acme/chall-v3/167118450786/0MqT5g

98.246.255.230 - - [21/Oct/2022:16:56:31 +0000] "GET /.well-known/acme-challenge/RUza89uyOihewga4LMt3XvfuJuPDrtBbzX07ehXGklI HTTP/1.1" 404 118 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0"
54.93.68.173 - - [21/Oct/2022:16:56:34 +0000] "GET /.well-known/acme-challenge/bXNSy_hi4T5YnjlGdYMKWYhT7CMrQy1FvsJaOfhyIPQ HTTP/1.1" 404 118 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
52.14.117.238 - - [21/Oct/2022:16:56:34 +0000] "GET /.well-known/acme-challenge/bXNSy_hi4T5YnjlGdYMKWYhT7CMrQy1FvsJaOfhyIPQ HTTP/1.1" 404 118 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
35.93.112.129 - - [21/Oct/2022:16:56:34 +0000] "GET /.well-known/acme-challenge/bXNSy_hi4T5YnjlGdYMKWYhT7CMrQy1FvsJaOfhyIPQ HTTP/1.1" 404 118 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
23.178.112.103 - - [21/Oct/2022:16:56:34 +0000] "GET /.well-known/acme-challenge/bXNSy_hi4T5YnjlGdYMKWYhT7CMrQy1FvsJaOfhyIPQ HTTP/1.1" 404 118 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

hmm

That's not Let's Encrypt :wink:

"GET /.well-known/acme-challenge/bXNSy_hi4T5YnjlGdYMKWYhT7CMrQy1FvsJaOfhyIPQ HTTP/1.1" 404 118 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Next thing is to know why there's a 404 response? As said, unfortunately I have no clue what acme-nginx is exactly, I'm not familiar with that client.

4 Likes

K, long story short:
I confirm a bug in either nginx or acme-nginx.

Story: after ngingx -s reload has reported it's done, it's not actually done.
But if I add a delay til it actually is done, then it works (yay, megumin domain is verified now).

May be it's something wrong with nginx falsely reporting being restarted.
May be acme-nginx should check if it's actually restarted instead of just inviting letsencrypt in.
But it clearly is a problem on my servers side.

I'll patch acme-nginx for now, expect a PR around this weekend.

4 Likes

Yup, that's not.
I've already debugged it and found the problem.
Now to fix it.

1 Like

nginx sometimes has difficulty starting on time for the challenge, that's also something we've noticed with the nginx plugin of Certbot. It even has an option to increase the waiting time before Certbot triggers the challenge validation.

5 Likes

I hope it is not a shaming, just a learning lesson.
We all will continue and I hope pleasant for all. :slightly_smiling_face:

6 Likes

That looks like a TYPO.

4 Likes

Yup, it is
Shame on me T_T

1 Like

No shame, we stand corrected :wink:

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.