Renewal failed, timeout on challenge files

My domain is rudhar.com. My web server is nginx version 1.22.1, built with OpenSSL 1.1.1q 5 Jul 2022 (running with OpenSSL 1.1.1t 7 Feb 2023), under Alpine Linux 3.17.3.

My website is hosted on a VPS, provided by Virtua.cloud in Lille, France. I can login to a root shell on my machine, and I'm not using a control panel. I'm using certbot version 1.32.0.

On February 2, 2023, via root’s crontab I ran the command "certbot renew", and the Letsencrypt certificate was successfully renewed. Cron tried again on April 17, 2023, and then it failed. After a "sudo su", I recently ran "certbot -v renew" myself several times, and examined the logs in /var/log/letsencrypt, files letsencrypt.log and letsencrypt.log.1 etc. The certificate renewal attempt failed every time, and in the same way.

The logged reason for the failures was:
"type": "urn:ietf:params:acme:error:connection",
"detail": "185.154.155.218: Fetching https://rudhar.com/.well-known/acme-challenge/[long and complicated file name]: Timeout during connect (likely firewall problem)", "status": 400

It is true that I have a firewall, nftables, which blocks some notorious hackers' IPv4 numbers. To be certain my ranges aren't too wide, I disabled the firewall (sudo service nftables stop) and retested. The renewal problem remained.

Then (again as root, of course) I manually ran the command "certbot certonly --manual", and in a different ssh login shell, I created the directory /var/www/html/.well-known/acme-challenge, and created and filled the challenge files there myself by hand. I could retrieve those files without any problems (using the exact URL from the log, to avoid any typos), from the web server using wget, and also from my laptop in the Netherlands (so not in Lille, France), using wget, curl, and Firefox. But certbot couldn't, still that timeout every time. Why?

I checked that my website is accessible from all over the world, using https://www.uptimia.com and https://semonto.com/tools/website-reachability-check . Result: the site works fine, from all over the world, including from California, where (using nslookup and whois) I found Letsencrypt.org is hosted in Google’s Cloud.

So that leaves me puzzled: if I can access those challenge files, why can't Letsencrypt? And now how can I renew my certificate? The current one will expire on May 3, so time is tight.

Welcome to the community @rudhar

It is because your IPv6 address in your DNS AAAA record is not correct. Your error shows the IPv4 address but Let's Encrypt checks from multiple vantage points around the globe and sometimes the IPv4 address shows in the error depending which location fails and how (and no, no request comes from google cloud).

You should correct your IPv6 config or remove the AAAA record. Here is the DNS

nslookup rudhar.com
A    Address: 185.154.155.218
AAAA Address: 2a07:8dc1:20:0:8b:cbff:fed1:94b3

And, here is the Let's Debug test site results which show this clearly

4 Likes

But then what's wrong with the IPv6 address? It is exactly what my hosting provider specified: 2a07:8dc1:20:0:008b:cbff:fed1:94b3

I also tried Let's Debug, and it reports problems for both IPv6 and IPv4. But I can access the site from home using (only) IPv4, and https://domsignal.com says IPv6 works fine too.

However Is your site IPv6 ready? says "Could not connect to rudhar.com on port 80 over IPv6". But perhaps that is because I let nginx issue a 301 on any http://, to redirect it to https:// ? Is a 301 the same as "could not connect"?

IPv6 is not responding on Port 80; you have an IPv6 issue.

>curl -6 -Ii http://rudhar.com/.well-known/acme-challenge/sometestfile
curl: (28) Failed to connect to rudhar.com port 80 after 75072 ms: Couldn't connect to server

Yet IPv4 does respond on Port 80.

>curl -4 -Ii http://rudhar.com/.well-known/acme-challenge/sometestfile
HTTP/1.1 301 Moved Permanently
Server: nginx/1.22.1
Date: Thu, 27 Apr 2023 15:09:04 GMT
Content-Type: text/html
Content-Length: 169
Connection: keep-alive
Location: https://rudhar.com/.well-known/acme-challenge/sometestfile

>curl -4 -Ii https://rudhar.com/.well-known/acme-challenge/sometestfile
HTTP/1.1 404 Not Found
Server: nginx/1.22.1
Date: Thu, 27 Apr 2023 15:09:23 GMT
Content-Type: text/html
Content-Length: 153
Connection: keep-alive
1 Like

No. A 301 is a successful connect which gives a different location to use

What do these do from your server

curl -4 https://ifconfig.io
curl -6 https://ifconfig.io
3 Likes

Let's Debug seems to only be showing an issue for IPv6, as I read it.

AAAANotWorking
ERROR
rudhar.com has an AAAA (IPv6) record (2a07:8dc1:20:0:8b:cbff:fed1:94b3) but a test request to this address over port 80 did not succeed. Your web server must have at least one working IPv4 or IPv6 address. You should either ensure that validation requests to this domain succeed over IPv6, or remove its AAAA record.
A timeout was experienced while communicating with rudhar.com/2a07:8dc1:20:0:8b:cbff:fed1:94b3: Get "http://rudhar.com/.well-known/acme-challenge/letsdebug-test": context deadline exceeded

Trace:
@0ms: Making a request to http://rudhar.com/.well-known/acme-challenge/letsdebug-test (using initial IP 2a07:8dc1:20:0:8b:cbff:fed1:94b3)
@0ms: Dialing 2a07:8dc1:20:0:8b:cbff:fed1:94b3
@10000ms: Experienced error: context deadline exceeded
1 Like

Yes, there actually is an IPv6 issue. I took out a temporary US server, from the same provider Virtua.cloud, and it too gives a timeout on attempts to reach the European server over IPv6. Also on other ports, like 443, 25, 22.

curl -4 https://ifconfig.io
works and returns my IPv4 number. However,
curl -6 https://ifconfig.io says:
"curl: (7) Failed to connect to ifconfig.io port 443 after 6245 ms: Couldn't connect to server"

So incoming AND outgoing IPv6 traffic doesn't work for the European server. Food for thought, now I can investigate further. That was useful input, guys, thanks for that.

Will try to disable IPv6 in the DNS, and see if Letsencrypt can then renew the certificate (perhaps after giving it time for DNS cache expiry). That would at least remove the hard May 3 deadline.

Thanks again.

3 Likes

Best Practice - Keep Port 80 Open

What IP addresses does Let’s Encrypt use to validate my web server?
Let’s Encrypt does not publish a list of IP addresses we use to validate,
and these IP addresses may change at any time.

Let's Encrypt uses Multi-Perspective Validation Improves Domain Validation Security - Let's Encrypt

2 Likes

You (usually) don't have to wait long for DNS cache expiry. The Let's Encrypt servers only look at the authoritive DNS servers which usually don't take long to sync. Usually this is no more than a minute or so but in some very slow DNS providers may be an hour but this is rare (and the only one taking that long was based in .ru).

5 Likes

I suggest using this online tool https://unboundtest.com/ to check you DNS.
From the site "The Unbound instance is configured very similarly to Let's Encrypt's production servers, and is started fresh for each query so there are no caching effects."

3 Likes

WOW! certbot renew says:
Congratulations, all renewals succeeded:
/etc/letsencrypt/live/rudhar.com/fullchain.pem (success)

The DNS Time To Live was set to 86400, and I changed it to 600 just a minute before. Apparently the server that LetsEncrypt used this time, didn't have the IPv6 address in its DNS cache yet, so it got only IPv4 because I removed the AAAA records. And that made the renewal succeed.

Now find out why IPv6 doesn’t work, but that's a different problem. The cause was not in Letsencrypt, that is now 100% clear!

What remains is that certbot’s logging is somewhat confusing. What I got was like this, for example:

{
"identifier": {
"type": "dns",
"value": "www.rudhar.com"
},
"status": "invalid",
"expires": "2023-05-03T08:56:09Z",
"challenges": [
{
"type": "http-01",
"status": "invalid",
"error": {
"type": "urn:ietf:params:acme:error:connection",
"detail": "185.154.155.218: Fetching https://rudhar.com/.well-known/acme-challenge/w11GJ_qm9ZP-Yl1WSzz1ctqvEVhvBgfyEu3jfj-CZlo: Timeout during connect (likely firewall problem)",
"status": 400
},
"url": "https://acme-v02.api.letsencrypt.org/acme/chall-v3/222635785017/2nqLSA",
"token": "w11GJ_qm9ZP-Yl1WSzz1ctqvEVhvBgfyEu3jfj-CZlo",
"validationRecord": [
{
"url": "http://www.rudhar.com/.well-known/acme-challenge/w11GJ_qm9ZP-Yl1WSzz1ctqvEVhvBgfyEu3jfj-CZlo",
"hostname": "www.rudhar.com",
"port": "80",
"addressesResolved": [
"185.154.155.218",
"2a07:8dc1:20:0:8b:cbff:fed1:94b3"
],
"addressUsed": "2a07:8dc1:20:0:8b:cbff:fed1:94b3"
},
{
"url": "http://www.rudhar.com/.well-known/acme-challenge/w11GJ_qm9ZP-Yl1WSzz1ctqvEVhvBgfyEu3jfj-CZlo",
"hostname": "www.rudhar.com",
"port": "80",
"addressesResolved": [
"185.154.155.218",
"2a07:8dc1:20:0:8b:cbff:fed1:94b3"
],
"addressUsed": "185.154.155.218"
},
{
"url": "https://rudhar.com/.well-known/acme-challenge/w11GJ_qm9ZP-Yl1WSzz1ctqvEVhvBgfyEu3jfj-CZlo",
"hostname": "rudhar.com",
"port": "443",
"addressesResolved": [
"185.154.155.218",
"2a07:8dc1:20:0:8b:cbff:fed1:94b3"
],
"addressUsed": "2a07:8dc1:20:0:8b:cbff:fed1:94b3"
}
],
"validated": "2023-04-26T08:56:09Z"
}
]
}
2023-04-26 10:56:10,823:DEBUG:acme.client:Storing nonce: C878f9eXSfE89mwTdnriYVatilbvWaIvziSR5JTJRZTGPDs
2023-04-26 10:56:10,824:INFO:certbot._internal.auth_handler:Challenge failed for domain rudhar.com
2023-04-26 10:56:10,824:INFO:certbot._internal.auth_handler:Challenge failed for domain www.rudhar.com
2023-04-26 10:56:10,824:INFO:certbot._internal.auth_handler:http-01 challenge for rudhar.com
2023-04-26 10:56:10,824:INFO:certbot._internal.auth_handler:http-01 challenge for www.rudhar.com
2023-04-26 10:56:10,825:DEBUG:certbot._internal.display.obj:Notifying user:
Certbot failed to authenticate some domains (authenticator: webroot). The Certificate Authority reported these problems:
Domain: rudhar.com
Type: connection
Detail: 185.154.155.218: Fetching https://rudhar.com/.well-known/acme-challenge/ALWBG2GExxbjFz2Obp9nCXWTm6Jz4Q5P3rO-a5J1MhI: Timeout during connect (likely firewall problem)

Domain: www.rudhar.com
Type: connection
Detail: 185.154.155.218: Fetching https://rudhar.com/.well-known/acme-challenge/w11GJ_qm9ZP-Yl1WSzz1ctqvEVhvBgfyEu3jfj-CZlo: Timeout during connect (likely firewall problem)

Not easy to conclude from that that the ACTUAL problem is in the IPv6, and the IPv4 is OK, and that only IPv4 should remain in the DNS until the IPv6 issue is resolved. Or is it?

1 Like

No, it is not always easy. That's why Let's Debug is helpful in these cases. Look at the "Verbose Information" link at the bottom of that result page for clearer info.

And, I am not sure why this is reported that way. Normally you see an IPv6 address when it has failed but in last month or so we've seen this more where the IPv4 address shows.

Let's Encrypt will try with IPv4 (if available) if the IPv6 fails in specific situations so it may just not report the original error and instead the later IP. It may also be a result of using multiple data centers for these checks and perhaps some get different results than others so it is hard to know what IP to show in that case. I am mostly guessing based on my experience as I did not study the Boulder code.

4 Likes

Solved that too!

My VPS provider explained that that was a problem with the Neighbor Discovery Protocol (NDP). Neighbor cache entries got timed out and were eventually removed, because IPv6 was rarely used. Solution: do a periodic ping to the IPv6 gateway, like so:

ping6 -c1 -W5 -I eth0 fe80::1

1 Like

You should not need to do that. While troubleshooting that is out of scope for discussion here, you might want to see if your are blocking some of the ICMP traffic required for proper operation of IPv6.

6 Likes

How is fe80::1 your IPv6 gateway?
Do you have a real [routable] IPv6 address?

3 Likes

IPv6 is different.

4 Likes

If that is the gateway [which I didn't dispute], then there must be some IPv6 NAT going on; As that IP can't be routed over the Internet.
OR, they don't really have an Internet reachable IPv6 address.
And, thus, my second question?:

OR
They use some voodoo networking...
Where IP in network A routes through an IP in network B.
[where A and B don't intersect, nor overlap]

4 Likes

Right now they must :slight_smile:

curl -I4  rudhar.com
HTTP/1.1 301 Moved Permanently

curl -I6  rudhar.com
HTTP/1.1 301 Moved Permanently
4 Likes

Doesn't mean IPv6 NAT isn't involved.
[I put all my IPv6 ssytems behind NATting firewalls]

4 Likes