Certificate renewal failure

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is:
eddiem.com - I host others but this is the most urgent.

I ran this command: Webmin auto-renew or SSH certbot.

It produced this output:Detail: During secondary validation: 103.1.186.221: Fetching
http://eddiem.com/.well-known/acme-challenge/Ywf2vqqW_j6_PJnvjFmVnZaNMqaOmxfGK1LBYDqBRHk:
Timeout during connect (likely firewall problem)

My web server is (include version):
Apache version 2.4.41

The operating system my web server runs on is (include version):
Ubuntu Linux 20.04.6
My hosting provider, if applicable, is:
Binary Lane cloud - https://www.binarylane.com.au/
I can login to a root shell on my machine (yes or no, or I don't know):
Yes
I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
Webmin version 2.202
Virtualmin version 7.10.0
The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
certbot 2.11.0

I share a cloud server with a friend.
Webmin letsencrypt autorenew worked perfect until around may 2024 then we both had problems. We independently tried to solve them and both failed. However my friend's two domains certificates auto renewed in June (my autonew was off).
I have confirmed certbot is creating and removing the challenge file.
I have confirmed the .well-known/acme-challenge file are accessible over http.
The .htaccess is still there are viewable in my browser.
I have checked that eddiem.com is accessible from many countries including Sweden and Singapore. I see no geoblocking anywhere.
It is not super-urgent.
Any clues?
Ciao Eddie,

2 Likes

Welcome to the community @eddiem

Well, the "Secondary Validation" often points to problems from the two non-USA countries you checked. But, let's assume you don't have any geo-blocking.

If single HTTP(s) test requests succeed but an actual cert request fails due to timeout a few things come to mind.

  1. You have a firewall blocking specific IP addresses. Either by IP range or implicitly by checking signatures of some kind and rejecting those.
  2. You have some kind of DDoS based firewall that blocks repeated identical requests from various locations all arriving at the same time.
  3. Temporary comms problem affecting routing between LE and your server

Do any of these seem possible?

Can you check your Apache access log and show the requests that do get through? Because when you see Secondary failure at least the Primary got through and possibly one or more of the Secondaries. Knowing the requesting IP of the ones that got through might help identify the problem.

4 Likes

Also remember to check your firewall at both the machine level and the cloud host control panel. Also tools like Fail2Ban might be working to block things.

4 Likes

Thanx for the replies.
I did try turning the server fire wall off at one point with no change.
I don't think I've done that at the host level.
I'm only running UFW at the server. I did have fail2ban once and I'll check if it is still there.
The logs will have to wait a little while.

2 Likes

Also - I only get one attempt before "too many failed authorizations"

I managed to get this from the apache log on my second attempt today.

23.178.112.108 - - [27/Aug/2024:16:25:52 +1000] "GET /.well-known/acme-challenge/EDgPDLg-8LKCw2K4amk_WR7fNuVhXKg6dlTh_kmdHAo HTTP/1.1" 200 294 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
52.88.107.43 - - [27/Aug/2024:16:25:52 +1000] "GET /.well-known/acme-challenge/EDgPDLg-8LKCw2K4amk_WR7fNuVhXKg6dlTh_kmdHAo HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
16.171.240.113 - - [27/Aug/2024:16:25:52 +1000] "GET /.well-known/acme-challenge/EDgPDLg-8LKCw2K4amk_WR7fNuVhXKg6dlTh_kmdHAo HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Are those "292"s significant?

To clarify. I got "too many failed authorizations" on the above attempt not the timeout error.

This error means you hit a rate limit because your validation as failed so many times, it will go away as per Rate Limits - Let's Encrypt

Those validation attempts look ok to me ( 292 is the response size in bytes I think) but I think you should expect more than 3, maybe not.

2 Likes

Huh. Two of the three IP are in the US and the 3rd is Sweden. You would see at least 4 and more likely 5 successful requests. You are missing from US and Singapore. (Again, note this IP locations can change at any time).

The 292 is just the length of the reply and sometimes this varies if your server sends out response headers too. That's not very important. The '200' preceding that is the HTTP response which is the "OK" reply. Mind you, your server can send an OK but send faulty data with it but that causes a different failure from Let's Encrypt Server.

You should not have gotten a "too many failed" requests error from just one request. Could you have some system that is repeatedly retrying a request without your specific instruction? This "too many" is not for an individual challenge from a specific server but rather a failure of the overall cert request itself.

4 Likes

I also found a bunch of these is the apache error log overnight.
It probably has nothing to do with certbot but there are lots of errors to do with ".well-known/acme-challenge"

[Tue Aug 27 23:12:34.359619 2024] [autoindex:error] [pid 1415907] [client 4.196.120.128:32739] AH01276: Cannot serve directory /home/eddiemco/public_html/.well-known/: No matching DirectoryIndex (index.html,index.htm,index.php,index.php4,index.php5) found, and server-generated directory index forbidden by Options directive

In a different virtual server log I found.

[Mon Aug 26 18:07:07.944539 2024] [proxy_fcgi:error] [pid 915878] [client 20.117.200.45:50017] AH01071: Got error 'PHP message: No route found - full:/.well-known/pki-validation/cloud.php query:'
[Mon Aug 26 18:07:10.899181 2024] [proxy_fcgi:error] [pid 915879] [client 20.117.200.45:51724] AH01071: Got error 'PHP message: No route found - full:/.well-known/acme-challenge/cloud.php query:'
And many more.

I also noticed the eddiem.com logs are owned by root:root while other logs are owned by the account and www.data - eg nerdiped:www-data

I tried again using the webmin form.
This time I request all the variants not only eddiem.com

Apache responded to 11 challenge GETs from 6 different IPs.
The requests were for 4 different files.

"
Requesting a certificate for eddiem.com, www.eddiem.com, autoconfig.eddiem.com, autodiscover.eddiem.com, eddiem.info, www.eddiem.info from Let's Encrypt .."

and is complaining about missing A/AAAA records

23.178.112.219 - - [28/Aug/2024:12:10:20 +1000] "GET /.well-known/acme-challenge/g_9Gf5BoOnSGCMzjENrvREjmSS3uA1QhykxWzAzt01Q HTTP/1.1" 200 294 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
23.178.112.211 - - [28/Aug/2024:12:10:20 +1000] "GET /.well-known/acme-challenge/rRGakX7IRQdeN_rXZtv_LZ-R1756GkUvKEfVGveLJyI HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
34.208.247.115 - - [28/Aug/2024:12:10:20 +1000] "GET /.well-known/acme-challenge/g_9Gf5BoOnSGCMzjENrvREjmSS3uA1QhykxWzAzt01Q HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
34.208.247.115 - - [28/Aug/2024:12:10:21 +1000] "GET /.well-known/acme-challenge/rRGakX7IRQdeN_rXZtv_LZ-R1756GkUvKEfVGveLJyI HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.60.243.25 - - [28/Aug/2024:12:10:21 +1000] "GET /.well-known/acme-challenge/rRGakX7IRQdeN_rXZtv_LZ-R1756GkUvKEfVGveLJyI HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
23.178.112.219 - - [28/Aug/2024:12:12:24 +1000] "GET /.well-known/acme-challenge/FJvp_B9usfVmrLNc8YJ8Av5t-3I8tHyN6I1fHA3lCjI HTTP/1.1" 200 294 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
23.178.112.213 - - [28/Aug/2024:12:12:25 +1000] "GET /.well-known/acme-challenge/i5Gw4A0HXSHZN4t8Pyj_5ZeJ-9QeX2GdhlXvEXJAanA HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
34.208.247.115 - - [28/Aug/2024:12:12:25 +1000] "GET /.well-known/acme-challenge/FJvp_B9usfVmrLNc8YJ8Av5t-3I8tHyN6I1fHA3lCjI HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.61.18.54 - - [28/Aug/2024:12:12:25 +1000] "GET /.well-known/acme-challenge/FJvp_B9usfVmrLNc8YJ8Av5t-3I8tHyN6I1fHA3lCjI HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
34.208.247.115 - - [28/Aug/2024:12:12:25 +1000] "GET /.well-known/acme-challenge/i5Gw4A0HXSHZN4t8Pyj_5ZeJ-9QeX2GdhlXvEXJAanA HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
13.61.12.179 - - [28/Aug/2024:12:12:26 +1000] "GET /.well-known/acme-challenge/i5Gw4A0HXSHZN4t8Pyj_5ZeJ-9QeX2GdhlXvEXJAanA HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

As for this, the only two domain names that have an A record are your two apex domains - eddiem.com and eddiem.info. All of the subdomains are missing their A record. Check your DNS control panel

As for the access log it is pretty much like your earlier one. The HTTP Challenge for each domain name will use the same URL but come from different requester IP. There is at most 3 successful requests appearing in those logs for each domain name (check the full URL) where there must be at least 4 and probably will be 5.

In this latest log it looks like you ran two tries within a few seconds of each other. If you group them by their URL the pattern is more easily seen.

This still looks like a problem like I described in post #2. I think you need to research more about what kind of security devices are involved in your hosting or network config. You may need to ask your Binary Lane Cloud support team.

2 Likes

I am in a different state (geographically) and on a different computer, I have been doing other things.
My cloud server is un-managed but the host do provide support up to the firewall.
I contacted them twice and got these replies.
reply one..
Our infrastructure does not impose any traffic blocking, filtering, or shaping beyond the rules you define within the 'External Firewall' for your VPS. By default, unless port-blocking is enabled, all traffic - both inbound and outbound - is permitted. We do not interfere with or restrict these connections in any way.

Since you've already checked both the OS and VPS firewalls, if possible, could you please provide traceroute reports from your server towards the Let's Encrypt endpoint? This will help us diagnose any potential routing issues. You can use dig to identify the IP address(es) for the Let's Encrypt endpoint for your traceroute tests.

reply two.

No worries - thanks for the forum link; it was an interesting read.

The only suggestion I would have at this point is to ensure that the relevant A/AAAA records have been created for your domain. It could be routing issues, but that's unlikely given the consistency of the problem across different validation attempts.

If you haven’t already, I’d recommend using MXToolbox to check the propagation of the relative subdomain records.

Feel free to let us know if you're needing anything else from us.
end-quote

It would be good if letsencrypt had some way (bot) to trace-route from their end. We don't know what is missing from our end.
I did look through the host fire-wall IPs and saw nothing from Singapore. There is no way I see to temporally disable it. I could possibly change all the rules to "accept" for testing an back again.

For which names?

2 Likes

Certbot lists these for eddiem.com

eddiem.com
admin.eddiem.com
autoconfig.eddiem.com
autodiscover.eddiem.com
webmail.eddiem.com
www.eddiem.com

But I'm only trying the get a cert for eddiem.com ATM.
Certbot lists 90 subdomains in total. I host 20 virtual web-servers.

I kind of live in two places and this is not the best place for doing this stuff.
I might give eddiem.com a rest and look at some of the other domains.
Also I'm surprised by the lack of IP block lists.
I'm not a guru but not a beginner either.
I would have expected UFW or fail2ban to be doing something but I see zilch.

When you want a certificate to include a list of names you need to prove each name is under your control before the CA will issue a certificate. There are two main ways to prove you control a domain: HTTP domain validation (answering an http request at that domain) and DNS validation (answering a TXT record query at that domain).

Note that (for example) autoconfig.eddiem.com and eddiem.com and consider to be two different names and you have to prove control of each one.

You are currently using HTTP domain validation. When Let's Encrypt tests your domain using an http request to http://autoconfig.eddiem.com the DNS lookup for that IP address returns no result. You haven't setup anything in DNS to say that autoconfig.eddiem.com should resolve to the server that certbot is running on, so HTTP domain validation can never work for that name.

If you don't want to point each name to an IP you can instead use DNS domain validation. That involves updating a TXT record for each name variation and is normally automated if your DNS provider supports that and there is an appropriate API provider for your ACME client (certbot).

Example dig query showing that the name does not resolve (NXDOMAIN):

dig autoconfig.eddiem.com

; <<>> DiG 9.18.28-0ubuntu0.24.04.1-Ubuntu <<>> autoconfig.eddiem.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 32607
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;autoconfig.eddiem.com.         IN      A

;; AUTHORITY SECTION:
eddiem.com.             1800    IN      SOA     dns1.name-services.com. info.name-services.com. 1630400190 172800 900 1814400 3600
2 Likes

I am not trying to get a certificate for autoconfig.eddiem.com etc. Are you saying it is still needed?
Virtualmin allows you to request a cert for the main domain or all the domains. I'm only requesting the main one. I did request the others once as an experiment but ATM I'm not doing one.

Ok, I'm just skimming your posts and see those domains in your previous post.

You're fully in control of which domains you want a cert for, and you should definitely only include the names you do want a single cert to cover.

So I'm confused as to what's the current status of your problem - what are you now doing and what exact error are you currently seeing?

2 Likes

Thanx for looking.
Nothing has changed for a week or more. I'm not doing much at all on this ATM.
I have other things to do here. When I return to Queensland I'll have more time for this.
I don't know when that will be.
I ran the virtual-min cert request this morning but the PC hung trying to read the log. I wasn't expecting anything new - just hopeful.
It is just a personal site. It causes some email problems and I think it may be stopping me editing databases in phpmyadmin.

Some of the other domains I host but don't control had cert problems but then worked again.
I want to take a closer look at these and other domains to see if they behave the same way.
I think everything failed the last time I did a dry-run but I didn't check the logs.
It seems like a server wide issue but sometimes it works at random times.

1 Like

I don't know if this is interesting/useful?
There are two domains I host but I don't control the domain records. I won't name them here.
The challenges for these failed months ago then started to work again.

I just did some certbot dry-runs using the option- 1: Apache Web Server plugin (apache).
One of these consistently passes --- "The dry run was successful."
The other fails and all others I have tried fail. -"Some challenges have failed."

I will try again another time and look at logs.

Another time...

I looked at the log for the successful dry run. This domain hasn't failed in 6 runs of so.

162.158.244.161 - - [06/Sep/2024:18:08:56 +1000] "GET /.well-known/acme-challenge/l23CPQzsbtQSlQkHBk_ixhndpOSh6U0be7fLKqPOCU8 HTTP/1.1" 200 329 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
108.162.226.11 - - [06/Sep/2024:18:08:56 +1000] "GET /.well-known/acme-challenge/l23CPQzsbtQSlQkHBk_ixhndpOSh6U0be7fLKqPOCU8 HTTP/1.1" 200 329 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
172.71.146.245 - - [06/Sep/2024:18:08:56 +1000] "GET /.well-known/acme-challenge/l23CPQzsbtQSlQkHBk_ixhndpOSh6U0be7fLKqPOCU8 HTTP/1.1" 200 329 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
172.68.245.110 - - [06/Sep/2024:18:08:56 +1000] "GET /.well-known/acme-challenge/l23CPQzsbtQSlQkHBk_ixhndpOSh6U0be7fLKqPOCU8 HTTP/1.1" 200 329 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
162.158.134.127 - - [06/Sep/2024:18:08:56 +1000] "GET /.well-known/acme-challenge/l23CPQzsbtQSlQkHBk_ixhndpOSh6U0be7fLKqPOCU8 HTTP/1.1" 200 329 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Note there is a challenge from Singapore.

And next is a failed challenge - not eddiem.com as the log size there causes problems.

66.133.109.36 - - [06/Sep/2024:16:39:22 +1000] "GET /.well-known/acme-challenge/0Rt_SwqcOdAD-x1ZNCNL1Cik6M5-wpuXsDuWdowi2tg HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
34.217.148.215 - - [06/Sep/2024:16:39:24 +1000] "GET /.well-known/acme-challenge/0Rt_SwqcOdAD-x1ZNCNL1Cik6M5-wpuXsDuWdowi2tg HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
16.170.168.171 - - [06/Sep/2024:16:39:24 +1000] "GET /.well-known/acme-challenge/0Rt_SwqcOdAD-x1ZNCNL1Cik6M5-wpuXsDuWdowi2tg HTTP/1.1" 200 292 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Same old same old. Only 3 challenges and none from Singapore.

The working challenge suggests there is no server wide block?

1 Like

What is the domain name that always works for --dry-run (I want to look at the IP and cert related history)

A successful cert request will usually have 5 successful HTTP challenge requests as there are 5 LE Auth Centers. Although LE tolerates one failure so even 4 might be seen with successful cert request. Mind you, the number and location of challenge centers can change as might the "rule" about allowing one failed HTTP challenge.

Then what is the domain name for this one that repeatedly fails? I want to poke around the same as the other.

Log size might be handled by doing something like

cat (logfile) | grep /.well-known/acme-challenge
2 Likes