SERVFAIL looking up CAA, but I see NOERROR myself

How are you testing against the authoritative nameservers? Can you share the commands? Is it possible you're testing with public DNS servers that enforce DNSSEC (e.g. Google 8.8.8.8) and not enforcing DNSSEC when you test directly against the authoritative nameserver (e.g. with dig)?

Today I was checking both without DNSSEC, e.g.

dig @64.6.64.6 menzIs.faCETbAse.nl CAA -> SERVFAIL (Verisign DNS)
dig @ns01.is.nl menzIs.faCETbAse.nl CAA -> NOERROR
dig @publicdns.goog menzIs.faCETbAse.nl CAA -> NOERROR

Just tried adding +dnssec to each of those commands, and it gives me the same results.

I’ve noticed there seems to be some caching element to this problem too, as queries that SERVFAIL at first seem to go to NOERROR a few minutes later.

My new working theory is that the issue is caused by rate limiting on the providers end, which I’ve run into with dig several times as well.

I’ve increased TTL on the A records hoping that it will have some effect. Other suggestions still welcome though.

Google and Verisign both enforce DNSSEC by default, unless you explicitly ask them not to (+cd). +dnssec just changes how much information they return.

Thanks, that’s a good tip. The TTL for A records hasn’t seemed to help.

Adding +cd changes the status from SERVFAIL to NOERROR for the Verisign case.

What is the switch to enforce DNSSEC for a server which doesn’t enforce it by default? (I couldn’t figure it out from dig -help). That would help me to further test ns01.is.nl and help show the provider what’s going wrong.

You have 3 servers so I would test for CAA records against all 3 of those nameservers rather than google and verisign

image

Let’s encrypt uses the name servers specified by your domain records.

Andrei

Good to have that confirmed Andrei. The problem is that so far I’ve been unable to trigger a SERVFAIL on those name servers, so I can’t prove to the provider that their DNS is misconfigured.

If you or anyone could show me the dig command(s) to trigger the SERVFAIL, it’s more likely that they will take me seriously.

hi @WouterTinus

@cpu and @jsha and @schoen should be able to look at logs from their side

there is also a tool which closely resembles how Let’s Encrypt resolves DNS queries

You can have a look at it here: https://unboundtest.com/

For some reasons your domains seem to return 404 which i think happens when the domain can’t be queried

@jsha should be able to help further

Andrei

To be clear the site I’m trying to renew is https://live.du.reports.menzis.facetbase.nl, but the error I get back from the ACME server is “SERVFAIL looking up CAA for reports.menzis.facetbase.nl”. So the actual certificate that I’m trying to renew is for a domain two levels deeper than the one the CAA fails for according to the ACME server.

That website you just linked shows errors requesting CAA records at every level of the domain except for the root, though most of them seem to be timeouts.

hi @WouterTinus

I am not sure how Let’s Encrypt deals with nested domains such as yours. I would wait until some of the people who have a better understanding chip in.

Andrei

There probably isn't any. dig is a relatively simple program. Authoritative DNS servers rarely return SERVFAIL directly. Usually, in a situation like this, it's synthesized by the recursive DNS server when it encounters one or more of a variety of error conditions, such as the domain's authoritative DNS servers being down, or responding with an error code, or responding with invalid data -- all of which Namebright is guilty of -- or DNSSEC that fails validation, and so forth.

In many of those cases, dig against the authoritative server will show some sort of valid-looking-ish response, or a different sort of error.

Edit: Oops, i thought this was the Namebright thread, not the is.nl thread. I'm sorry. Still, it doesn't change the rest of my response.

It looks for a CAA record, left to right, stopping when it gets one or when it reaches the public suffix.

E.g. if you validate www.example.com, it will do CAA queries for www.example.com. and example.com..

A deeply nested case like in this thread is exactly the same except with, uh, more queries.

(In Boulder's case, prioritizing speed over resource usage, it fires off each DNS query in parallel and sorts out the results afterwards, rather than going one at a time, but that doesn't really matter.)

1 Like

As @mnordhoff says, we check the FQDN, and each parent, from left to right. That means that if the nameserver for live.du.reports.menzis.facetbase.nl supports adding CAA records, you can just add a CAA record permitting issuance, and everything will work nicely. Read more here.

The same happens for me, when trying to renew the cert for m.auditcenter.hu. Also https://unboundtest.com/ shows an error message. All the other 160 domains for which I use Let’s Encrypt with the exact same configuration (from the exact same server) work correctly.

Are the other 160 domains using the same authoritative DNS servers, with DNSSEC enabled?

I'm not clear why, but Unbound appears to think they're behaving improperly.

https://unboundtest.com/m/CAA/m.auditcenter.hu/K27MGN5E

Aug 01 18:10:51 unbound[27014:0] info: validator operate: query m.auditcenter.hu. CAA IN
Aug 01 18:10:51 unbound[27014:0] debug: verify: signature mismatch
Aug 01 18:10:51 unbound[27014:0] info: validator: response has failed AUTHORITY rrset: m.auditcenter.hu. NSEC IN
Aug 01 18:10:51 unbound[27014:0] info: Validate: message contains bad rrsets

Edit:

It appears to be related to capitalized negative responses (e.g. for AAAA as well) and not particularly CAA. I might speculate that it's the PowerDNS bug fixed in version 4.0.4, but the servers strangely respond to version queries with SERVFAIL, and i'm not knowledgeable enough to otherwise be sure.

No, they are using around 100 nameservers (their owners and registrators are different). I only have the SERVFAIL problem with this one, but cannot reproduce the error message neither with the host nor with the dig command line utility.

Send a query to a resolver that validates DNSSEC, for a record set that doesn't exist, ensuring some of it is capitalized.

(Let's Encrypt and https://unboundtest.com/ are configured to always use random capitalization (so-called 0x20 randomization) for security purposes, and to validate DNSSEC, exposing problems like this more often than many other resolvers.)

$ dig Auditcenter.Hu aaaa @publicdns.goog

; <<>> DiG 9.10.3-P4-Ubuntu <<>> Auditcenter.Hu aaaa @publicdns.goog
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 60292
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;Auditcenter.Hu.                        IN      AAAA

;; Query time: 344 msec
;; SERVER: 2001:4860:4860::8844#53(2001:4860:4860::8844)
;; WHEN: Tue Aug 01 18:36:11 UTC 2017
;; MSG SIZE  rcvd: 43

Similar to this other thread, all signs point to your nameservers running PowerDNS <4.0.3, with “version-string=anonymous” hiding the version (@sahsanu reports that setting can cause version.bind txt ch queries to return SERVFAIL). I would recommend asking your DNS operator to upgrade to version 4.0.4 or above as soon as possible, since all of their customers will have this problem.

Finally my provider rolled out an update to their DNS platform and this seems to have fixed the problem. They still haven’t disclosed whether or not they are using PowerDNS.

Thanks for all your assistance!

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.