It produced this output:
Failed authorization procedure. grm.fleetnova.com (tls-sni-01): urn:acme:error:connection :: The server could not connect to the client to verify the domain :: DNS problem: SERVFAIL looking up CAA for fleetnova.com
My web server is (include version):
The operating system my web server runs on is (include version):
My hosting provider, if applicable, is:
canhost / OVH
I can login to a root shell on my machine (yes or no, or I don’t know):
I’m using a control panel to manage my site (no, or provide the name and version of the control panel):
understood - the odd thing is that it has worked before - and i have multiple domains using the same host and dns and I have been able to get a cert for the other domains just yesterday
try getice.ca for example that one should servfail too but I got a cert yesterday no problem
August 15 is a bit late for this, but i think there may be a bug/limitation in the broken CAA whitelist.
User has certificate for grm.fleetnova.com but not fleetnova.com.
User tries to validate grm.fleetnova.com.
CAA query for grm.fleetnova.com fails, but it's on the exception list.
CAA query for fleetnova.com fails, but it's not on the exception list.
CAA query for com succeeds.
Validation fails with CAA error due to fleetnova.com failure.
It's only a hypothesis, but i think this could explain what @rictd is experiencing.
LookupCAA's exception check seems to be an "exact match" check rather than taking into account parent or child domains, so the exception list would need to have been manually generated to include failed domains and their parents (at least when their parents are also broken).
grm.fleetnova.com has past certificates and is in the SERVFAIL exception files you pasted the other day. fleetnova.com does not and is not.
getice.ca and www.getice.ca both have past certificates and are both on the list and @rictd was just able to get a new certificate for them.
What do you think? Other than "I wish it was September 8 already."
If fleetnova.com is in the internal exception list, i guess i'm way off and something maybe weird is happening. If it isn't, the exception list or code may need to be updated to include parents where necessary.
Edit: Fix "example.com" and a couple errors. I should do editing before hitting submit...
Good sleuthing, @mnordhoff. I agree with your assessment: since the SERVFAIL exception is implemented in LookupDNS, it only applies on a per-lookup basis. Most affected sites haven’t hit this issue because they had certs for both subdomains and parent domains, meaning both were listed in the exception list. However, since the parent domain fleetnova.com wasn’t in the exception list, it hit this bug.
I think rather than fix the exceptions code, the best temporary fix is to manually add fleetnova.com to the list. Maybe @cpu or @roland can help with that? Note that this will allow @rictd to get a new cert, but only until Sep 8.
@rictd, thanks for following up with your host. Glad they are working on it! From a close reading, it sounds like they just committed to CAA support, not to upgrading their PowerDNS. Just in case they’re not planning to upgrade, you might want to remind them that there are a lot of known security vulnerabilities in the version they are running.
One other thought: Per https://letsencrypt.org/docs/caa/, CAA records for subdomains override parent domains. So setting a CAA record authorizing Let’s Encrypt on grm.fleetnova.com would solve your problem. Of course, since your DNS provider doesn’t support CAA, there’s a catch-22. But this can be solved by adding an NS record to grm.fleetnova.com pointing just that domain to another provider. This might be easier and faster than moving all of your DNS, in case adding the base domain to the exceptions list is taking too long.