Thank you for looking into this Please allow me some time as I’ll have to check with the user which software he is using for his authoritative DNS server. Our out curiosity, dnssectest.net doesn’t run into any errors during the validation so I was wondering why the results might be different on the LE servers.
It looks like your user’s authoritative nameservers are having some trouble with the combination of DNSSEC and DNS 0x20 (mixed case queries). We use DNS 0x20 in production to improve the security of our DNS lookups. When I temporarily disabled it on a test instance, the responses for CAA zaaksysteem.nl validated. When I re-enabled it, the responses were invalid again. I’m guessing there is some discrepancy in whether your authoritative resolver signs the mixed-case form or the lowercase form. I think it’s supposed to be the latter, though I’d have to double-check the RFCS. At any rate, if we can find out what software the user is on, hopefully we can nail it down.
Addendum: if I do mixed-case requests manually through my local unbound (DNSSEC-validating) resolver, I get a SERVFAIL, too (with use-caps-for-ids: no), so while it’s still unbound in the mix, the problem isn’t specific to that option. Annoyingly, all the online DNSSEC checking tools I can find normalise the request to lowercase before sending it, so I can’t get a complete log of the misbehaviour. However, sending a mixed-case query to Google’s open DNS (which does DNSSEC validation) returns SERVFAIL, but its cache does case-normalisation, so if you send an all-lowercase request first, it works for the mixed-case version later – and, conversely, if you send the mixed-case version first, you will then get a SERVFAIL on the all-lowercase version! (At least until the cache expires, or you end up hitting a different machine in the load balancer group)
DNS is weird. And everyone seems to be in agreement that PowerDNS is broken.
@jsha, what’s the process for getting a domain onto the “CAA SERVFAIL” exceptions list hinted at in the API announcements topic? I doubt we’re going to be able to fix the world’s PowerDNS servers in the next few weeks.
Currently ad-hoc, but we are planning to remove it entirely by the September 8 deadline set at CA/Browser Forum for enforcing CAA, since it's not compatible with the new requirements. So our hope is to not make adding to it a regular process, but to focus on getting people's DNS fixed before then.
OK, well, can we put zaaksysteem.nl on that list for now, so we can get their certificate renewed? We’ll prod the customer to prod their DNS provider to fix their stuff, but that’ll probably take longer than the existing cert has left, especially given there doesn’t appear to be any existing bug reports to PowerDNS yet on this (or at least not any that I could find).
So hopefully this should be a pretty straightforward fix for cyso.net if we can get in touch with their administrators.
@mpalmer@tgx am I right in assuming that you are interested specifically in community.zaaksysteem.nl? I was confused since the original message didn’t mention the full name so I didn’t see the relationship to Discourse. I see that the cert is quite close to expiration so I’ll file a special request with our ops people to add community.zaaksysteem.nl to the list so you can renew on time, and report back here when it’s ready. Does Discourse normally renew certificates when there are 30 days remaining on them? Assuming so, were there factors other than the CAA SERVFAIL change deployed last week that caused this renewal to be delayed?
Bummer! It's possible there's a bug in the exceptions code. I'll take a look.
I've also sent a polite email to Cyso's domain admin pointing out the issue and requesting an upgrade.
I believe you have a possible workaround: The bug only manifests on empty responses. If the response to the CAA query is non-empty, validation succeeds. I believe this is why you get the error on zaaksysteem.nl instead of on community.zaaksysteem.nl, because the response for dig CAA community.zaaksysteem.nl is non-empty: it contains a CNAME.
Since community.zaaksysteem.nl is CNAME'd to zaaksysteem.bydiscourse.com, you should be able to add a CAA record authorizing issuance by Let's Encrypt to the zaaksysteem.bydiscourse.com zone. Since CAA processing proceeds from the left to right, Boulder will see this record and stop processing.
Well kiss mah grits! I gave up trying to figure out the exact precedence rules for CAA records in the presence of CNAMEs and such – no two people on the CABF public list seemed to be able to agree on anything. If I can fix all our CAA problems by adding a record to the subdomain, I’ll just do that. Problem solved.
Of course, Cloudflare’s semi-hiding CAA record support behind some sort of beta access request-by-ticket thing, so I can’t test it out now, but I’ll give it a go as soon as they come through.
I have not seen it, but it is now at the top of the “to be read” pile. Thanks for the pointer.
A quick update: after wrestling with Cloudflare for a couple of days to get the ability to create CAA records, I’ve now created an appropriate CAA record on the CNAME target, and that appears to have now allowed issuance (or the problem with the CAA exceptions list was fixed?). Either way, I’ve renewed the cert now, and I’ll get on to creating CAA records for all our CNAME targets, which should prevent any further unpleasantness due to PowerDNS bugs.
Thanks for your help, @jsha. Really appreciate it.