Let's Encrypt http-01 ACME challenge fails with timeout

Hello!

My domain is:

relay-02.torproject.net

I ran this command:

cerbot -v

It produced this output:

Performing the following challenges:
http-01 challenge for relay-02.torproject.net
Waiting for verification...
Challenge failed for domain relay-02.torproject.net
http-01 challenge for relay-02.torproject.net

Certbot failed to authenticate some domains (authenticator: apache). The Certificate Authority reported these problems:
Domain: relay-02.torproject.net
Type: connection
Detail: 185.129.61.129: Fetching http://relay-02.torproject.net/.well-known/acme-challenge/ozs8uoIq7NgCenCZyMrfbnyM0ce8Jye0pd3KVcKUOT8: Network unreachable

My web server is (include version):

Apache 2.4.41

The operating system my web server runs on is (include version):

Ubuntu 20.04

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don't know):

Yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):

No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):

certbot 2.11.0

We have had successful certificate renewals in the past (crt.sh | relay-02.torproject.net), however starting with the April 2024 edition this failed. Now, this server does not have any firewall and it is reachable from different vantage points on the Internet. E.g.

Nmap scan report for relay-02.torproject.net (185.129.61.129)
Host is up (0.014s latency).

PORT STATE SERVICE
80/tcp open http
443/tcp open https
9091/tcp open xmltec-xmlmail

It just seems that the Let's Encrypt check has suddenly problems now reaching our server.

Note: there is a Tor exit node running on that same IP address. One thing we were wondering is whether that IP landed on some block list between letsencrypt.org and 185.129.61.129. However, we have no problem doing something like curl https://acme-v02.api.letsencrypt.org/directory from that box. There is no timeout or anything in that direction.

4 Likes

Wow, the actual Tor project. Welcome to the Let's Encrypt community! And thank you for all you do.

I don't recall seeing a "Network unreachable" error from the Let's Encrypt validation servers before. It's probably happened, but that's got to be a rare one. (I've seen it plenty from systems that were misconfigured outbound to not be able to reach the Let's Encrypt servers, but you seem to have covered that.) So this might be tricky, but we'll see what we can do to help. I've got some questions for you, though I'm not sure what the solution would be even once we have the answers.

Your post title mentions "timeout", is there a "timeout" in some other error message, or is "Network unreachable" (without anything saying "Secondary validation") the only error you get?

Do you get the same error when trying in the staging environment (adding --dry-run to the certbot command?)

You say that it stopped in April, but looking at the crt.sh history you linked to I think you had some problems even before that. Certbot by default (I'm assuming you haven't changed this) will start trying to renew 30 days before expiration, running twice a day to check whether renewal needs to happen. So if your last renewal was Jan. 10, then certbot will have been trying since Mar. 10. But your certificate before that was issued Aug. 23, so certbot would have been trying to renew it starting on Oct. 22, but renewal didn't actually happen until that Jan. 10 certificate. And similarly there was a gap with no certificate between Aug. 2 and Aug. 23. Have you been needing to do manual interventions to get certificates these other times, or do you do something other than the default automatic certbot schedule?

There was a small change in how Let's Encrypt's validators were set up in July 2023. I don't really think it's related, but I'm going to tag @jcjones just in case he wants to look into it.

Have you tried any other Certificate Authorities? I think you could use certbot renew --dry-run --server https://api.test4.buypass.no/acme/directory to try BuyPass Go's staging environment. (Or do you need to set up a new cert in certbot to switch CAs instead of just doing a renewal?) There are several free CAs that support ACME, though Buypass Go is the only one I know of that both has a staging server and doesn't require any separate account to be set up. And I don't know how amenable other CAs are to issuing for the Tor project.

7 Likes

I suggest you talk to your ISP, as there seems to be a BGP/routing problem for 185.129.61.129/AS210731. My own AS in Germany (AS3320) immediately replies with "network unreachable" for any IP in 185.129.61.0/24, indicating that my ISP doesn't have a route for that.

From the viewpoint of ASN24940 everything appears fine:

traceroute 185.129.61.129
traceroute to 185.129.61.129 (185.129.61.129), 30 hops max, 60 byte packets
 1  172.31.1.1 (172.31.1.1)  3.968 ms  3.943 ms  3.939 ms
 2  24185.your-cloud.host (65.108.118.176)  0.610 ms  0.595 ms  0.582 ms
 3  * * *
 4  spine2.hel1.cloud1.hetzner.com (88.198.254.117)  1.251 ms spine1.hel1.cloud1.hetzner.com (88.198.254.113)  1.256 ms spine2.hel1.cloud1.hetzner.com (88.198.254.117)  1.226 ms
 5  * * *
 6  core31.hel1.hetzner.com (213.239.228.1)  0.656 ms core32.hel1.hetzner.com (213.239.228.5)  0.325 ms core31.hel1.hetzner.com (213.239.228.9)  0.320 ms
 7  core52.sto.hetzner.com (213.239.254.66)  6.913 ms core53.sto.hetzner.com (213.239.254.62)  6.809 ms core52.sto.hetzner.com (213.239.254.66)  6.610 ms
 8  core40.sto.hetzner.com (213.239.252.70)  6.938 ms core40.sto.hetzner.com (213.239.252.78)  6.936 ms core40.sto.hetzner.com (213.239.252.70)  6.902 ms
 9  as2603-20g-ix1.sthix.net (192.121.80.45)  6.868 ms  6.878 ms  6.851 ms
10  se-tug.nordu.net (109.105.101.52)  7.214 ms se-sthb.nordu.net (109.105.101.62)  7.506 ms se-tug.nordu.net (109.105.101.28)  7.453 ms
11  peer-as2603.khk7nqp8.dk.ip.tdc.net (128.76.59.125)  15.689 ms dk-bal2.nordu.net (109.105.97.10)  17.419 ms peer-as2603.khk7nqp8.dk.ip.tdc.net (128.76.59.125)  15.995 ms
12  peer-as2603.khk7nqp8.dk.ip.tdc.net (128.76.59.125)  16.631 ms  16.740 ms  16.722 ms
13  130.225.242.198 (130.225.242.198)  22.782 ms ore.core.fsknet.dk (109.105.102.161)  17.435 ms 130.225.242.198 (130.225.242.198)  23.454 ms
14  relay-02.torproject.net (185.129.61.129)  23.242 ms 130.225.242.198 (130.225.242.198)  24.166 ms  29.881 ms

I haven't looked at the BGP data much (https://bgp.tools), but my first guess would be that one of your upstream ISP is having issues?

4 Likes

Peter Cooper Jr. via Let's Encrypt Community Support:

Wow, the actual Tor project. Welcome to the Let's Encrypt community! And thank you for all you do.

Thanks! Same goes for all the folks involved in the Let's Encrypt
efforts. <3

I don't recall seeing a "Network unreachable" error from the Let's Encrypt validation servers before. It's probably happened, but that's got to be a rare one. (I've seen it plenty from systems that were misconfigured outbound to not be able to reach the Let's Encrypt servers, but you seem to have covered that.) So this might be tricky, but we'll see what we can do to help. I've got some questions for you, though I'm not sure what the solution would be even once we have the answers.

Your post title mentions "timeout", is there a "timeout" in some other error message, or is "Network unreachable" (without anything saying "Secondary validation") the only error you get?

Yes. There is nothing about "Secondary validation". I had read Seth's
post
(Unexpected renewal failures since April 2024? Please read this!)
and was looking for any secondary validation related things but found
none. I think I am just not getting that far in the validation dance.

Do you get the same error when trying in the staging environment (adding --dry-run to the certbot command?)

Yes.

You say that it stopped in April, but looking at the crt.sh history you linked to I think you had some problems even before that. Certbot by default (I'm assuming you haven't changed this) will start trying to renew 30 days before expiration, running twice a day to check whether renewal needs to happen. So if your last renewal was Jan. 10, then certbot will have been trying since Mar. 10. But your certificate before that was issued Aug. 23, so certbot would have been trying to renew it starting on Oct. 22, but renewal didn't actually happen until that Jan. 10 certificate. And similarly there was a gap with no certificate between Aug. 2 and Aug. 23. Have you been needing to do manual interventions to get certificates these other times, or do you do something other than the default automatic certbot schedule?

Yeah, we needed to do stuff manually, correct. So far, we never looked
closer at that and debugged why the automatic renewal did not work...
(Doing the manual renewal every three months was okay-ish)

There was a small change in how Let's Encrypt's validators were set up in July 2023. I don't really think it's related, but I'm going to tag @jcjones just in case he wants to look into it.

Have you tried any other Certificate Authorities? I think you could use certbot renew --dry-run --server https://api.test4.buypass.no/acme/directory to try BuyPass Go's staging environment. (Or do you need to set up a new cert in certbot to switch CAs instead of just doing a renewal?) There are several free CAs that support ACME, though Buypass Go is the only one I know of that both has a staging server and doesn't require any separate account to be set up. And I don't know how amenable other CAs are to issuing for the Tor project.

No, renewal is fine and what we actually want. I did not try any because
I actually was not sure which to pick. So, I just did the dry-run
command you gave above:

Failed to renew certificate relay-02.torproject.net with error:
HTTPSConnectionPool(host='api.test4.buypass.no', port=443): Max retries
exceeded with url: /acme/directory (Caused by
ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at
0x7f4df4ada310>, 'Connection to api.test4.buypass.no timed out. (connect
timeout=45)'))

So, it seems the problem is more widespread than just Let's Encrypt
related. That's annoying but a "good" result, though. :slight_smile: Thanks for your
help.

3 Likes

Max via Let's Encrypt Community Support:

I suggest you talk to your ISP, as there seems to be a BGP/routing problem for 185.129.61.129/AS210731. My own AS in Germany (AS3320) immediately replies with "network unreachable" for any IP in 185.129.61.0/24, indicating that my ISP doesn't have a route for that.

From the viewpoint of ASN24940 everything appears fine:

traceroute 185.129.61.129
traceroute to 185.129.61.129 (185.129.61.129), 30 hops max, 60 byte packets
  1  172.31.1.1 (172.31.1.1)  3.968 ms  3.943 ms  3.939 ms
  2  24185.your-cloud.host (65.108.118.176)  0.610 ms  0.595 ms  0.582 ms
  3  * * *
  4  spine2.hel1.cloud1.hetzner.com (88.198.254.117)  1.251 ms spine1.hel1.cloud1.hetzner.com (88.198.254.113)  1.256 ms spine2.hel1.cloud1.hetzner.com (88.198.254.117)  1.226 ms
  5  * * *
  6  core31.hel1.hetzner.com (213.239.228.1)  0.656 ms core32.hel1.hetzner.com (213.239.228.5)  0.325 ms core31.hel1.hetzner.com (213.239.228.9)  0.320 ms
  7  core52.sto.hetzner.com (213.239.254.66)  6.913 ms core53.sto.hetzner.com (213.239.254.62)  6.809 ms core52.sto.hetzner.com (213.239.254.66)  6.610 ms
  8  core40.sto.hetzner.com (213.239.252.70)  6.938 ms core40.sto.hetzner.com (213.239.252.78)  6.936 ms core40.sto.hetzner.com (213.239.252.70)  6.902 ms
  9  as2603-20g-ix1.sthix.net (192.121.80.45)  6.868 ms  6.878 ms  6.851 ms
10  se-tug.nordu.net (109.105.101.52)  7.214 ms se-sthb.nordu.net (109.105.101.62)  7.506 ms se-tug.nordu.net (109.105.101.28)  7.453 ms
11  peer-as2603.khk7nqp8.dk.ip.tdc.net (128.76.59.125)  15.689 ms dk-bal2.nordu.net (109.105.97.10)  17.419 ms peer-as2603.khk7nqp8.dk.ip.tdc.net (128.76.59.125)  15.995 ms
12  peer-as2603.khk7nqp8.dk.ip.tdc.net (128.76.59.125)  16.631 ms  16.740 ms  16.722 ms
13  130.225.242.198 (130.225.242.198)  22.782 ms ore.core.fsknet.dk (109.105.102.161)  17.435 ms 130.225.242.198 (130.225.242.198)  23.454 ms
14  relay-02.torproject.net (185.129.61.129)  23.242 ms 130.225.242.198 (130.225.242.198)  24.166 ms  29.881 ms

I haven't looked at the BGP data much (https://bgp.tools), but my first guess would be that one of your upstream ISP is having issues?

Okay, interesting. I'll ask around, thanks!

3 Likes

Actually, I just realized that this error stems from me connecting them, so this is the direction that worked in the Let's Encrypt case. torsocks certbot renew --dry-run --server https://api.test4.buypass.no/acme/directory to the rescue gives me:

Failed to renew certificate relay-02.torproject.net with error: Unable to register an account with ACME server. Error returned by the ACME server: Bad Request :: Email is a required contact

Yeah, Buypass does require an email address be provided. I don't think it necessarily needs to be "valid" (by which I mean that they don't require an email to be answered in order for the account to be made), but they don't allow anonymous registrations. I don't know how certbot deals with switching CAs, but you might be able to just add a --email parameter with your email.

2 Likes

You can have a multitude of different ACME server URLs. For every ACME server, an account directory is made.

Yup.

And I've never received any email from Buypass, so I don't think it really matters what you enter there. They probably use it for the same stuff as Let's Encrypt, for incidents et cetera.

2 Likes

Peter Cooper Jr. via Let's Encrypt Community Support:

Yeah, Buypass does require an email address be provided. I don't think it necessarily needs to be "valid" (by which I mean that they don't require an email to be answered in order for the account to be made), but they don't allow anonymous registrations. I don't know how certbot deals with switching CAs, but you might be able to just add a --email parameter with your email.

Unfortunately, that does not seem to work as expected: torsocks certbot renew --dry-run --email gk-test@torproject.org --server https://api.test4.buypass.no/acme/directory still gives me the same error

Failed to renew certificate relay-02.torproject.net with error: Unable
to register an account with ACME server. Error returned by the ACME
server: Bad Request :: Email is a required contact

I wonder whether we could make progress here by switching to the dns-01
challenge. I have not checked whether that would actually work in our
setup but maybe that could be something to explore, too?

Weird, I just registered an account with Buypass without any issue:

certbot register --work-dir . --logs-dir . --config-dir . --server https://api.test4.buypass.no/acme/directory --email gk-test@torproject.org

(Note that I won't use this account at all and will remove the temporary directory this has run in immediately, so your email address should not receive any emails from this test. I purposely included it to check if maybe Buypass had manually disabled e.g. the torproject.net domain, although I would have found that very weird if they did..)

Can you check the appropriate letsencrypt.log and see if the correct payload was send to the Buypass server? In my case it was:

2024-06-30 12:37:51,190:DEBUG:acme.client:JWS payload:
b'{\n  "contact": [\n    "mailto:gk-test@torproject.org"\n  ],\n  "termsOfServiceAgreed": true\n}'
2024-06-30 12:37:51,194:DEBUG:acme.client:Sending POST request to https://api.test4.buypass.no/acme/new-acct:

Yours should be very similar if not identical.

If yours is missing the mailto: field, maybe you have a cli.ini set which empties the email address? Hm, I can't seem to override the command line option with cli.ini apparently here.. So that's probably not it.

1 Like

Yeah, I was wondering how well certbot would handle the case of trying to switch CAs for an existing cert.

How about trying to register the account separate, as @Osiris managed

certbot register --server https://api.test4.buypass.no/acme/directory --email gk-test@torproject.org

And then trying to dry-run pointing to that server

certbot renew --dry-run --server https://api.test4.buypass.no/acme/directory

If you want to try getting a cert from their production endpoint, do the same thing with --server https://api.buypass.com/acme/directory instead and without the --dry-run.

3 Likes

I got BuyPass production cert just now replacing a previous Let's Encrypt cert

This command prompted for several answers to setup a new BuyPass ACME account and then proceeded with the cert request. The cert is good for 180 days.

sudo certbot certonly --nginx -d example.com --server https://api.buypass.com/acme/directory

I also used certbot renew to replace an existing Certbot LE cert profile with BuyPass. But, maybe it worked better since I did the above request and got a BP account setup. Doing a register command probably would have worked too.

Anyway, this worked once had existing BP account

sudo certbot renew --cert-name example2.com --server https://api.buypass.com/acme/directory

One quirk was CAA record. I forgot that I had one so my first try with BuyPass rightly failed to issue due to CAA restriction. So, I added BuyPass to my CAA RR but BP did not see the change immediately even though I checked that Route53 had properly synced its name servers.

It looks like BuyPass cached the CAA lookup results from the first failed try. Not exactly sure for how long.

4 Likes

Changing CA to fix domain validation is a temporary workaround. If BuyPass are eventually compelled to use multi-perspective validation then the same problem will crop up again.

Check out DNS validation, you just need to be able to add/update a TXT record in your domains DNS. If there isn't an established plugin for your DNS host(s) then you can script it. See also acme.sh which has a lot of dns providers written in bash.

1 Like

How do you know that?

The error message doesn't mention "Secondary validation" so looks like primary US center failing. There was even the recent change to not even try secondaries until the primary succeeded.

Agree it is best to resolve the network routing problem. Before that is resolved it could affect anyone including other CAs even now.

3 Likes

@MikeMcQ you're right, I'm skim-reading.

4 Likes

Peter Cooper Jr. via Let's Encrypt Community Support:

Yeah, I was wondering how well certbot would handle the case of trying to switch CAs for an existing cert.

How about trying to register the account separate, as @Osiris managed

 certbot register --server https://api.test4.buypass.no/acme/directory --email gk-test@torproject.org

And then trying to dry-run pointing to that server

 certbot renew --dry-run --server https://api.test4.buypass.no/acme/directory

That got me further along, although, I think I now ran into the CAA
issue Mike described. I can't fix that one myself but will ask the right
folks next week.

I guess I have some possible options now (with the DNS challenge as
well), apart from getting the core issue fixed (which I can't do myself
either).

Thanks to everyone who helped, much appreciated!

2 Likes

Very definitely a CAA restriction. Currently only LE can issue from a specific account

dig +noall +answer CAA relay-02.torproject.net
relay-02.torproject.net. 286    IN      CAA     128 issue "letsencrypt.org;accounturi=https://acme-v02.api.letsencrypt.org/acme/acct/794033397"

If you are like me you may have to wait a bit before BuyPass sees any change to the CAA

Not sure implication of flag=128 when multiple CAA records present. Which you will have when adding BuyPass (or anyone else). If a CAA expert could clarify that would be great : )
Certificate Authority Authorization (CAA) - Let's Encrypt

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.