Help diagnosing CAA failures `ns1.cyso.nl`

I’m facing a similar problem as well

$ dig @ns2.cyso.eu. zaaksysteem.nl type257

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @ns2.cyso.eu. zaaksysteem.nl type257
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48841
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;zaaksysteem.nl.			IN	CAA

;; AUTHORITY SECTION:
zaaksysteem.nl.		2560	IN	SOA	ns1.cyso.nl. dnsadmin.cysonet.com. 1500026318 16384 2048 1048576 300

;; Query time: 261 msec
;; SERVER: 93.94.227.172#53(93.94.227.172)
;; WHEN: Wed Jul 19 14:46:35 KST 2017
;; MSG SIZE  rcvd: 97

$ dig @ns3.cyso.net. zaaksysteem.nl type257

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @ns3.cyso.net. zaaksysteem.nl type257
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33077
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;zaaksysteem.nl.			IN	CAA

;; AUTHORITY SECTION:
zaaksysteem.nl.		2560	IN	SOA	ns1.cyso.nl. dnsadmin.cysonet.com. 1500026318 16384 2048 1048576 300

;; Query time: 285 msec
;; SERVER: 46.23.87.194#53(46.23.87.194)
;; WHEN: Wed Jul 19 14:46:48 KST 2017
;; MSG SIZE  rcvd: 97

$ dig @ns1.cyso.nl. zaaksysteem.nl type257

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @ns1.cyso.nl. zaaksysteem.nl type257
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30999
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;zaaksysteem.nl.			IN	CAA

;; AUTHORITY SECTION:
zaaksysteem.nl.		2560	IN	SOA	ns1.cyso.nl. dnsadmin.cysonet.com. 1500026318 16384 2048 1048576 300

;; Query time: 262 msec
;; SERVER: 109.235.73.249#53(109.235.73.249)
;; WHEN: Wed Jul 19 14:46:58 KST 2017
;; MSG SIZE  rcvd: 97

lookups for CAA returns NOERROR for me. Nothing seems wrong with DNSSEC as well https://dnssectest.net/zaaksysteem.nl/A.

Hi @tgx,

I split your post out from the parent thread since its likely the root cause will differ.

Looking into your case further it appears to me to be a DNSSEC validation failure from an invalid signature on a RRSET.

Jul 19 14:44:03 unbound[24997:0] info: verify rrset cached zaaksysteem.nl. SOA IN
Jul 19 14:44:03 unbound[24997:0] info: verify rrset zaaksysteem.nl. NSEC IN
Jul 19 14:44:03 unbound[24997:0] debug: verify sig 35973 8
Jul 19 14:44:03 unbound[24997:0] debug: verify: signature mismatch
Jul 19 14:44:03 unbound[24997:0] debug: rrset failed to verify: no valid signatures
Jul 19 14:44:03 unbound[24997:0] debug: verify result: sec_status_bogus
Jul 19 14:44:03 unbound[24997:0] info: validator: response has failed AUTHORITY rrset: zaaksysteem.nl. NSEC IN
Jul 19 14:44:03 unbound[24997:0] info: Validate: message contains bad rrsets

In this case the authority section of the response was:

zaaksysteem.nl.	0	IN	SOA	ns1.cyso.nl. dnsadmin.cysonet.com. 1500026318 16384 2048 1048576 300
zaaksysteem.nl.	2560	IN	RRSIG	SOA 8 2 2560 20170727000000 20170713000000 35973 zaaksysteem.nl. KxxD2xy7NyVTFyZ0gjexj1DRRc3hwGoVGX+jC5MfpprZMtfMM2mc6ZpbORKjQn/CodbF5Zcaflli3gVmR9gjllm6s8DY7FDtrwQijMxhaccRkXicrfOXEnuuEYp7CF1nXSsV2MzBOtSlCyCxafTj19IpaJx6d0QX3r7b1kA7t4o= ;{id = 35973}
zaaksysteem.nl.	300	IN	NSEC	*.ZAAksySTeEm.Nl. A NS SOA MX TXT RRSIG NSEC DNSKEY
zaaksysteem.nl.	300	IN	RRSIG	NSEC 8 2 300 20170727000000 20170713000000 35973 zaaksysteem.nl. EKkFX41sy77H5revhB5VKMTTvp5s0nG6Mu9CilZkoFXeudCdpNcwRIXraa2VLjby1FykMHvSUq6vEE+VexPWdKZhaqAmyGym1ZA7kRYuQXGwt2J88dGLSa1MsugAcDxc4GJo+ox1ZS79wftZ3OlXtBSkmUjLM5nclczKT5KMqXI= ;{id = 35973}

What software are you using for your authoritative DNS server?

1 Like

Thank you for looking into this :thumbsup: Please allow me some time as I’ll have to check with the user which software he is using for his authoritative DNS server. Our out curiosity, dnssectest.net doesn’t run into any errors during the validation so I was wondering why the results might be different on the LE servers.

It looks like your user’s authoritative nameservers are having some trouble with the combination of DNSSEC and DNS 0x20 (mixed case queries). We use DNS 0x20 in production to improve the security of our DNS lookups. When I temporarily disabled it on a test instance, the responses for CAA zaaksysteem.nl validated. When I re-enabled it, the responses were invalid again. I’m guessing there is some discrepancy in whether your authoritative resolver signs the mixed-case form or the lowercase form. I think it’s supposed to be the latter, though I’d have to double-check the RFCS. At any rate, if we can find out what software the user is on, hopefully we can nail it down.

2 Likes

Looks like it might have been premature to split this off from the parent (PowerDNS) topic:

$ dig @ns1.cyso.nl version.bind txt chaos +short
"Served by PowerDNS - http://www.powerdns.com"
$ dig @ns2.cyso.eu version.bind txt chaos +short
"Served by PowerDNS - http://www.powerdns.com"

Might be the same root cause after all.

Addendum: if I do mixed-case requests manually through my local unbound (DNSSEC-validating) resolver, I get a SERVFAIL, too (with use-caps-for-ids: no), so while it’s still unbound in the mix, the problem isn’t specific to that option. Annoyingly, all the online DNSSEC checking tools I can find normalise the request to lowercase before sending it, so I can’t get a complete log of the misbehaviour. However, sending a mixed-case query to Google’s open DNS (which does DNSSEC validation) returns SERVFAIL, but its cache does case-normalisation, so if you send an all-lowercase request first, it works for the mixed-case version later – and, conversely, if you send the mixed-case version first, you will then get a SERVFAIL on the all-lowercase version! (At least until the cache expires, or you end up hitting a different machine in the load balancer group)

DNS is weird. And everyone seems to be in agreement that PowerDNS is broken.

@jsha, what’s the process for getting a domain onto the “CAA SERVFAIL” exceptions list hinted at in the API announcements topic? I doubt we’re going to be able to fix the world’s PowerDNS servers in the next few weeks.

Currently ad-hoc, but we are planning to remove it entirely by the September 8 deadline set at CA/Browser Forum for enforcing CAA, since it's not compatible with the new requirements. So our hope is to not make adding to it a regular process, but to focus on getting people's DNS fixed before then.

OK, well, can we put zaaksysteem.nl on that list for now, so we can get their certificate renewed? We’ll prod the customer to prod their DNS provider to fix their stuff, but that’ll probably take longer than the existing cert has left, especially given there doesn’t appear to be any existing bug reports to PowerDNS yet on this (or at least not any that I could find).

I reported an issue to PowerDNS, and they tell me:

So hopefully this should be a pretty straightforward fix for cyso.net if we can get in touch with their administrators.

@mpalmer @tgx am I right in assuming that you are interested specifically in community.zaaksysteem.nl? I was confused since the original message didn't mention the full name so I didn't see the relationship to Discourse. I see that the cert is quite close to expiration so I'll file a special request with our ops people to add community.zaaksysteem.nl to the list so you can renew on time, and report back here when it's ready. Does Discourse normally renew certificates when there are 30 days remaining on them? Assuming so, were there factors other than the CAA SERVFAIL change deployed last week that caused this renewal to be delayed?

Thanks,
Jacob

1 Like

Alright, community.zaaksysteem.nl is now on the list. Please try renewing and let me know if you run into trouble.

1 Like

Thanks for getting this on the list. Unfortunately, it doesn't seem to be helping; attempts to renew the certificate are still returning the dreaded text:

urn:acme:error:connection
DNS problem: SERVFAIL looking up CAA for zaaksysteem.nl

We'll be encouraging our customer to contact Cyso (I'll leave that to you, @tgx), and I've given Cyso the tweeting of a lifetime.

We've been doing renewals at 10 days, simply because we weren't aware of the 30 day recommendation. I've now rolled out a change to do renewals at 30 days. Tomorrow's renewal run is going to be doozy.

1 Like

Bummer! It's possible there's a bug in the exceptions code. I'll take a look.

I've also sent a polite email to Cyso's domain admin pointing out the issue and requesting an upgrade.

I believe you have a possible workaround: The bug only manifests on empty responses. If the response to the CAA query is non-empty, validation succeeds. I believe this is why you get the error on zaaksysteem.nl instead of on community.zaaksysteem.nl, because the response for dig CAA community.zaaksysteem.nl is non-empty: it contains a CNAME.

Since community.zaaksysteem.nl is CNAME'd to zaaksysteem.bydiscourse.com, you should be able to add a CAA record authorizing issuance by Let's Encrypt to the zaaksysteem.bydiscourse.com zone. Since CAA processing proceeds from the left to right, Boulder will see this record and stop processing.

Excellent! If you haven't see it, you may want to check out our Integration Guide.

Well kiss mah grits! I gave up trying to figure out the exact precedence rules for CAA records in the presence of CNAMEs and such -- no two people on the CABF public list seemed to be able to agree on anything. If I can fix all our CAA problems by adding a record to the subdomain, I'll just do that. Problem solved.

Of course, Cloudflare's semi-hiding CAA record support behind some sort of beta access request-by-ticket thing, so I can't test it out now, but I'll give it a go as soon as they come through.

I have not seen it, but it is now at the top of the "to be read" pile. Thanks for the pointer.

A quick update: after wrestling with Cloudflare for a couple of days to get the ability to create CAA records, I've now created an appropriate CAA record on the CNAME target, and that appears to have now allowed issuance (or the problem with the CAA exceptions list was fixed?). Either way, I've renewed the cert now, and I'll get on to creating CAA records for all our CNAME targets, which should prevent any further unpleasantness due to PowerDNS bugs.

Thanks for your help, @jsha. Really appreciate it.

3 Likes

Excellent, glad that worked for you. Nothing has changed on our end since Friday, so I’m pretty sure it was the addition of the CAA record.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.