Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.
My domain is: eddiem.com - I host others but this is the most urgent.
I ran this command: Webmin auto-renew or SSH certbot.
My web server is (include version):
Apache version 2.4.41
The operating system my web server runs on is (include version):
Ubuntu Linux 20.04.6
My hosting provider, if applicable, is:
Binary Lane cloud - https://www.binarylane.com.au/
I can login to a root shell on my machine (yes or no, or I don't know):
Yes
I'm using a control panel to manage my site (no, or provide the name and version of the control panel): Webmin version 2.202 Virtualmin version 7.10.0
The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
certbot 2.11.0
I share a cloud server with a friend.
Webmin letsencrypt autorenew worked perfect until around may 2024 then we both had problems. We independently tried to solve them and both failed. However my friend's two domains certificates auto renewed in June (my autonew was off).
I have confirmed certbot is creating and removing the challenge file.
I have confirmed the .well-known/acme-challenge file are accessible over http.
The .htaccess is still there are viewable in my browser.
I have checked that eddiem.com is accessible from many countries including Sweden and Singapore. I see no geoblocking anywhere.
It is not super-urgent.
Any clues?
Ciao Eddie,
Well, the "Secondary Validation" often points to problems from the two non-USA countries you checked. But, let's assume you don't have any geo-blocking.
If single HTTP(s) test requests succeed but an actual cert request fails due to timeout a few things come to mind.
You have a firewall blocking specific IP addresses. Either by IP range or implicitly by checking signatures of some kind and rejecting those.
You have some kind of DDoS based firewall that blocks repeated identical requests from various locations all arriving at the same time.
Temporary comms problem affecting routing between LE and your server
Do any of these seem possible?
Can you check your Apache access log and show the requests that do get through? Because when you see Secondary failure at least the Primary got through and possibly one or more of the Secondaries. Knowing the requesting IP of the ones that got through might help identify the problem.
Also remember to check your firewall at both the machine level and the cloud host control panel. Also tools like Fail2Ban might be working to block things.
Thanx for the replies.
I did try turning the server fire wall off at one point with no change.
I don't think I've done that at the host level.
I'm only running UFW at the server. I did have fail2ban once and I'll check if it is still there.
The logs will have to wait a little while.
Huh. Two of the three IP are in the US and the 3rd is Sweden. You would see at least 4 and more likely 5 successful requests. You are missing from US and Singapore. (Again, note this IP locations can change at any time).
The 292 is just the length of the reply and sometimes this varies if your server sends out response headers too. That's not very important. The '200' preceding that is the HTTP response which is the "OK" reply. Mind you, your server can send an OK but send faulty data with it but that causes a different failure from Let's Encrypt Server.
You should not have gotten a "too many failed" requests error from just one request. Could you have some system that is repeatedly retrying a request without your specific instruction? This "too many" is not for an individual challenge from a specific server but rather a failure of the overall cert request itself.
I also found a bunch of these is the apache error log overnight.
It probably has nothing to do with certbot but there are lots of errors to do with ".well-known/acme-challenge"
[Tue Aug 27 23:12:34.359619 2024] [autoindex:error] [pid 1415907] [client 4.196.120.128:32739] AH01276: Cannot serve directory /home/eddiemco/public_html/.well-known/: No matching DirectoryIndex (index.html,index.htm,index.php,index.php4,index.php5) found, and server-generated directory index forbidden by Options directive
In a different virtual server log I found.
[Mon Aug 26 18:07:07.944539 2024] [proxy_fcgi:error] [pid 915878] [client 20.117.200.45:50017] AH01071: Got error 'PHP message: No route found - full:/.well-known/pki-validation/cloud.php query:'
[Mon Aug 26 18:07:10.899181 2024] [proxy_fcgi:error] [pid 915879] [client 20.117.200.45:51724] AH01071: Got error 'PHP message: No route found - full:/.well-known/acme-challenge/cloud.php query:'
And many more.
I also noticed the eddiem.com logs are owned by root:root while other logs are owned by the account and www.data - eg nerdiped:www-data
As for this, the only two domain names that have an A record are your two apex domains - eddiem.com and eddiem.info. All of the subdomains are missing their A record. Check your DNS control panel
As for the access log it is pretty much like your earlier one. The HTTP Challenge for each domain name will use the same URL but come from different requester IP. There is at most 3 successful requests appearing in those logs for each domain name (check the full URL) where there must be at least 4 and probably will be 5.
In this latest log it looks like you ran two tries within a few seconds of each other. If you group them by their URL the pattern is more easily seen.
This still looks like a problem like I described in post #2. I think you need to research more about what kind of security devices are involved in your hosting or network config. You may need to ask your Binary Lane Cloud support team.
I am in a different state (geographically) and on a different computer, I have been doing other things.
My cloud server is un-managed but the host do provide support up to the firewall.
I contacted them twice and got these replies.
reply one..
Our infrastructure does not impose any traffic blocking, filtering, or shaping beyond the rules you define within the 'External Firewall' for your VPS. By default, unless port-blocking is enabled, all traffic - both inbound and outbound - is permitted. We do not interfere with or restrict these connections in any way.
Since you've already checked both the OS and VPS firewalls, if possible, could you please provide traceroute reports from your server towards the Let's Encrypt endpoint? This will help us diagnose any potential routing issues. You can use dig to identify the IP address(es) for the Let's Encrypt endpoint for your traceroute tests.
reply two.
No worries - thanks for the forum link; it was an interesting read.
The only suggestion I would have at this point is to ensure that the relevant A/AAAA records have been created for your domain. It could be routing issues, but that's unlikely given the consistency of the problem across different validation attempts.
If you haven’t already, I’d recommend using MXToolbox to check the propagation of the relative subdomain records.
Feel free to let us know if you're needing anything else from us.
end-quote
It would be good if letsencrypt had some way (bot) to trace-route from their end. We don't know what is missing from our end.
I did look through the host fire-wall IPs and saw nothing from Singapore. There is no way I see to temporally disable it. I could possibly change all the rules to "accept" for testing an back again.
But I'm only trying the get a cert for eddiem.com ATM.
Certbot lists 90 subdomains in total. I host 20 virtual web-servers.
I kind of live in two places and this is not the best place for doing this stuff.
I might give eddiem.com a rest and look at some of the other domains.
Also I'm surprised by the lack of IP block lists.
I'm not a guru but not a beginner either.
I would have expected UFW or fail2ban to be doing something but I see zilch.
When you want a certificate to include a list of names you need to prove each name is under your control before the CA will issue a certificate. There are two main ways to prove you control a domain: HTTP domain validation (answering an http request at that domain) and DNS validation (answering a TXT record query at that domain).
Note that (for example) autoconfig.eddiem.com and eddiem.com and consider to be two different names and you have to prove control of each one.
You are currently using HTTP domain validation. When Let's Encrypt tests your domain using an http request to http://autoconfig.eddiem.comthe DNS lookup for that IP address returns no result. You haven't setup anything in DNS to say that autoconfig.eddiem.com should resolve to the server that certbot is running on, so HTTP domain validation can never work for that name.
If you don't want to point each name to an IP you can instead use DNS domain validation. That involves updating a TXT record for each name variation and is normally automated if your DNS provider supports that and there is an appropriate API provider for your ACME client (certbot).
Example dig query showing that the name does not resolve (NXDOMAIN):
I am not trying to get a certificate for autoconfig.eddiem.com etc. Are you saying it is still needed?
Virtualmin allows you to request a cert for the main domain or all the domains. I'm only requesting the main one. I did request the others once as an experiment but ATM I'm not doing one.
Thanx for looking.
Nothing has changed for a week or more. I'm not doing much at all on this ATM.
I have other things to do here. When I return to Queensland I'll have more time for this.
I don't know when that will be.
I ran the virtual-min cert request this morning but the PC hung trying to read the log. I wasn't expecting anything new - just hopeful.
It is just a personal site. It causes some email problems and I think it may be stopping me editing databases in phpmyadmin.
Some of the other domains I host but don't control had cert problems but then worked again.
I want to take a closer look at these and other domains to see if they behave the same way.
I think everything failed the last time I did a dry-run but I didn't check the logs.
It seems like a server wide issue but sometimes it works at random times.
I don't know if this is interesting/useful?
There are two domains I host but I don't control the domain records. I won't name them here.
The challenges for these failed months ago then started to work again.
I just did some certbot dry-runs using the option- 1: Apache Web Server plugin (apache).
One of these consistently passes --- "The dry run was successful."
The other fails and all others I have tried fail. -"Some challenges have failed."
I will try again another time and look at logs.
Another time...
I looked at the log for the successful dry run. This domain hasn't failed in 6 runs of so.
What is the domain name that always works for --dry-run (I want to look at the IP and cert related history)
A successful cert request will usually have 5 successful HTTP challenge requests as there are 5 LE Auth Centers. Although LE tolerates one failure so even 4 might be seen with successful cert request. Mind you, the number and location of challenge centers can change as might the "rule" about allowing one failed HTTP challenge.
Then what is the domain name for this one that repeatedly fails? I want to poke around the same as the other.