Hi there,
Having problems with certbot - it is failing with the following message:
1: wiki.toud.pw
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate numbers separated by commas and/or spaces, or leave input
blank to select all options shown (Enter 'c' to cancel): 1
Renewing an existing certificate for wiki.toud.pw
Certbot failed to authenticate some domains (authenticator: apache). The Certifi cate Authority reported these problems:
Domain: wiki.toud.pw
Type: dns
Detail: During secondary validation: DNS problem: SERVFAIL looking up A for wi ki.toud.pw - the domain's nameservers may be malfunctioning; DNS problem: query timed out looking up AAAA for wiki.toud.pw
Hint: The Certificate Authority failed to verify the temporary Apache configurat ion changes made by Certbot. Ensure that the listed domains point to this Apache server and that it is accessible from the internet.
Some challenges have failed.
Ask for help or search for solutions at https://community.letsencrypt.org. See t he logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for mo re details.
Yes, I also see single queries to your DNS work fine from various tools.
The above error says "During secondary validation ..." means that the Primary Let's Encrypt validation center was successful. After that the 4 Secondary centers also try to validate. It is one or more of these that are failing.
These secondary centers are in different places around the world. I see you only have one DNS server. Do you block access to it from certain countries? Or, do you block repeated requests that arrive to you too fast? Something like a DoS firewall block?
Note the --apache option uses an HTTP Challenge. Let's Encrypt will query for an A and AAAA record as it prefers IPv6 if an AAAA record is present. You do not need to have both of these but your DNS must reply properly when queried. It will also query for a CAA record later. Again, you don't need to have one but then your server must give a proper 'not found' reply when asked for it.
I could run two but its the same physical machine, so it would seem redundant. I don't have multiple subnets.
Only CN level block is China. It's possible that a secondary IP got blocked inadvertently/unintentionally by CSF - but if so it wasn't intentional. Is there somewhere I can find an IP list for LE so as to insure they're on the allowlist?
No, Let's Encrypt does not publish a list of IP addresses used by its validation centers. This is a good wiki article that explains the details: Multi-Perspective Validation & Geoblocking FAQ The entire wiki is good but I link to the section more directly related to your question.
Can't you view your own DNS Server logs to see what was blocked?
Do you have any kind of limit on the volume of requests inbound to your lone DNS server? There can be a large burst of queries during a challenge. See for example what happens when looking up records using https://unboundtest.com
Note unboundtest does not run inside an LE validation center so cannot mimic the multi-perspective validation (or the total volume from multiple simultaneous centers).
There is DDOS protection on the servers using a syn-deflate method, but if the number of queries is large enough to trip that I'd say it was some misconfiguration on LE's end.
I receive too many requests to pick single ones out of a log, save perhaps in the off hours, unless they use a particular user-agent I could perhaps check against?
Looks like DNS validation might be the easiest way since I'm running my own nameserver, is there documentation on setting up certbot with that? Took a google search look, but the results seem to be regarding DNS issues during validation, not setting up validation per se, which isn't the same.
You should try disabling that and try again. LE does not retry failed queries. If that works you could try using Certbot --pre-hook and --post-hook to automatically turn that off and back on.
The DNS queries won't have a user-agent. You can make a new test request and check your logs from that time frame. That should limit the volume to assess.
Yes, that might help but if LE still might be blocked accessing the TXT record the same as it is for A and AAAA records. But, it might be worth trying.
If LetsEncrypt requires me to disable my security measures, then it's not a tenable way to see to my certification and I would have to look to other providers. Sorry, but that's just the rub. It may be more convinent for LetsEncrypt to have a less secure machine, but I, for one, am not trading security for convinence.
That said I'll look into the possible resolutions here.
I suggest reviewing that wiki I previously linked. There are many facets to security.
Sure, other Certificate Authorities may suit your needs better. I think Google Trust Services and ZeroSSL are the two larger ACME issuers that are free. I believe Certbot supports both of those as do other ACME Clients. Mind you, those CA must do multi-perspective validation as well. There are paid providers as well.
A million dollar question I have is: how am I supposed to authenticate the DNS traffic is genuinely from LetsEncrypt when the traffic appears to be proxied behind Cloudflare and doesn't publish their own IPs?
Like, we do realize the irony of complaining about end user security failing the authentication process when I can't verify the traffic myself right?
I suggested looking at your logs to check for rejections during the timeframe you made a cert request. These often complete within a few seconds.
I also suggested disabling the DDoS as a test to rule it in/out as the cause of the DNS query failures. If it is the cause and if you have an API method to turn it on/off the --pre-hook and --post-hook should work. These would only operate during the time you get a cert which, again, is usually very brief. But, if disabling DDoS for even a brief time is unacceptable then yes you'll need to explore other options. Certbot only runs those hooks when it actually makes a cert request. It does not run those when it "wakes up" just to check if the cert needs renewal.
The primary LE center does use a Cloudflare commercial product for its outbound traffic. But, the secondary centers are AWS. Not that this helps your concerns was just clarifying your Cloudflare comment. The wiki I linked provides more background of this infrastructure.
Hard to know whether "misconfigured" or just incompatible. LE issues over 6 million certs per day and so far you are the only complaint of that kind of failure. Sure, we've seen plenty of problems in the past with people's firewalls but you said you disabled all of them now.
Are you getting the identical error as before? That is "Secondary" and SERVFAIL?
If it's a general problem we should start seeing more frequent failure reports.
I see you were issued a ZeroSSL cert today so that looks viable for you. Isn't Comodo and ZeroSSL the same as Sectigo? I don't yet see any cert issued for that domain via Cloudflare in the public logs but delays are common so ...
Example #1:
Allow ZeroSSL certificates for site.com, including any subdomains as well as wildcards.
site.com. 3600 IN CAA 0 issue "sectigo.com"
site.com. 3600 IN CAA 0 issuewild "sectigo.com"
LE checks from 5 places but Sectigo only from 3 according to these docs: https://www.sectigo.com/mpic-faq (Q&A 'How does MPIC validation work?'). That might be why Sectigo works but Let's Encrypt doesn't.
Maybe your DNS/Firewall software allows 3 checks in quick succession but more than that triggers DDOS protection.
That doesn't mean Let's Encrypt is "misconfigured" though. Checking for more perspectives is allowed and more secure.
The minimum number of remote perspectives that a CA should check from will go up as per 3.2.2.9 of the baseline requirements. So if this is the issue, then you may start running into it with other CAs as well in the future.
There has been no proof presented showing a misconfiguration by LE.
All we can see is that some of the secondary validations have failed.
In the past, we've even seen routing problems to have caused such problems.
A packet capture from your end [or from a device in front of yours] might shed more light on the situation.