SERVFAIL looking up CAA, but I see NOERROR myself

Hi all,

I’m requesting a certificate for the domain live.du.reports.menzis.facetbase.nl using http-01 validation, but it fails with the error “DNS problem: SERVFAIL looking up CAA for reports.menzis.facetbase.nl”.

However, if I run “dig reports.menzis.facetbase.nl caa” it reports NOERROR, both for our internal dns, 8.8.8.8 and the primary nameserver for the domain facetbase.nl (ns01.is.nl). Also another, less deeply nested, domain is working fine (beheer.menzis.facetbase.nl)

How can I figure out what is going wrong?

It doesn’t work for my own resolver, or Google Public DNS, at least reliably.

Let’s Encrypt’s resolvers use randomized capitalization to increase security; i’m not certain, but that seems to be a problem in your case. Lowercase queries always seem to work, capitalized ones fail for some reason.

I don’t suppose it’s using PowerDNS, and a version older than 4.0.4?

$ dig +dnssec live.du.reports.menzis.facetbase.nl caa @publicdns.goog

; <<>> DiG 9.10.3-P4-Ubuntu <<>> +dnssec live.du.reports.menzis.facetbase.nl caa @publicdns.goog
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52436
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;live.du.reports.menzis.facetbase.nl. IN        CAA

;; AUTHORITY SECTION:
facetbase.nl.           1799    IN      SOA     ns01.is.nl. domain-admin.is.nl. 2017072804 10800 3600 604800 14400
facetbase.nl.           1799    IN      RRSIG   SOA 8 2 86400 20170810000000 20170720000000 22146 facetbase.nl. igz201A2zr/OE4NgY9fuYocTE1F/9lZC/QgmdC5b8ZE+uRXb2Aeuk2V5 am9q+EOS8U4lFB9VpTXert0dfOak5KTUvgdAM3ov0LJtIDR+bbjD3v8U uDpcxPuGlWMk0TcDmpGF88gnnXYmYxeVukdsvB8ltblrUX4enCgHkc6w Aus=
live.du.reports.menzis.facetbase.nl. 14399 IN NSEC test.live.du.reports.menzis.facetbase.nl. A RRSIG NSEC
live.du.reports.menzis.facetbase.nl. 14399 IN RRSIG NSEC 8 6 14400 20170810000000 20170720000000 22146 facetbase.nl. bgV4PfINi0b7E3ytCWAdnwOPDc9TsWzmFNpe+ooNSZPcuQzi4bj+5OEy ebZ+RqXfmtCimQzBz4iL8rmvLm8/CQ0uJGqw479xAYjczpvI36+QOutf Ia6CK3jYsV1CDxuNfVMegMFVOGcXjWm2PgJ23G/5LrvCn5Y5oC0oy7k1 7HU=

;; Query time: 208 msec
;; SERVER: 2001:4860:4860::8888#53(2001:4860:4860::8888)
;; WHEN: Fri Jul 28 09:53:52 UTC 2017
;; MSG SIZE  rcvd: 527

$ dig +dnssec Live.Du.Reports.Menzis.Facetbase.Nl caa @publicdns.goog

; <<>> DiG 9.10.3-P4-Ubuntu <<>> +dnssec Live.Du.Reports.Menzis.Facetbase.Nl caa @publicdns.goog
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 25288
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 512
;; QUESTION SECTION:
;Live.Du.Reports.Menzis.Facetbase.Nl. IN        CAA

;; Query time: 125 msec
;; SERVER: 2001:4860:4860::8844#53(2001:4860:4860::8844)
;; WHEN: Fri Jul 28 09:54:06 UTC 2017
;; MSG SIZE  rcvd: 64

Edit: My resolver reports “validate(nodata): sec_status_bogus”.

Edit: Debug logging reports “NODATA response failed to prove NODATA status with NSEC/NSEC3”.

Thanks for the quick reply.

Since when have they been doing this “trick” with the capitalization? I haven’t encountered this problem before. I see the same errors with random capitals as you do on @publicdns.goog, though not on any of the authorative servers (@ns[1-3].is.nl). Which ones would Let’s Encrypt query?

I don’t know what server my provider is using, but I will ask them to look into this.

For reference the "trick" is usually referred to as "0x20 randomization" and as far as I'm aware we've been using it since launch. I suspect the only change was the CAA servfail change. We've seen some other nameservers have a strange interaction between 0x20 and CAA records.

Ok, so that change might be what triggered the problem for me.

My provider is using the DNS server “Go away, no need to know!” :wink: so I don’t know what they are using, but can any of you verify that that there is a problem on their end?

I think the DNSSEC issue with the NSEC/NSEC3 which was there is resolved now, at least http://dnsviz.net/d/live.du.reports.menzis.facetbase.nl/dnssec/ shows no errors in the chain anymore.

The other thing is the 0x20 randomization, but I could only trigger that on the Google Server, not on their own. They seem to have some kind of rate limiting set up though, so right now I’m unable to test it directly @ns01.is.nl, though for now it seems to work fine on @8.8.8.8.

The ACME server still says “DNS problem: SERVFAIL looking up CAA for reports.menzis.facetbase.nl” though.

Can anyone provide some insight into this? My provider is not being very helpful.

I seem to get the error quite randomly on public DNS servers, but never on any of name servers registered with the domain, e.g. dig @ns01.is.nl Reports.MENziS.fACetBASE.NL CAA -> NOERROR

What could be the issue? Is it possible there is a misconfiguration on their end, even if their own server doesn't show an error?

How are you testing against the authoritative nameservers? Can you share the commands? Is it possible you're testing with public DNS servers that enforce DNSSEC (e.g. Google 8.8.8.8) and not enforcing DNSSEC when you test directly against the authoritative nameserver (e.g. with dig)?

Today I was checking both without DNSSEC, e.g.

dig @64.6.64.6 menzIs.faCETbAse.nl CAA -> SERVFAIL (Verisign DNS)
dig @ns01.is.nl menzIs.faCETbAse.nl CAA -> NOERROR
dig @publicdns.goog menzIs.faCETbAse.nl CAA -> NOERROR

Just tried adding +dnssec to each of those commands, and it gives me the same results.

I’ve noticed there seems to be some caching element to this problem too, as queries that SERVFAIL at first seem to go to NOERROR a few minutes later.

My new working theory is that the issue is caused by rate limiting on the providers end, which I’ve run into with dig several times as well.

I’ve increased TTL on the A records hoping that it will have some effect. Other suggestions still welcome though.

Google and Verisign both enforce DNSSEC by default, unless you explicitly ask them not to (+cd). +dnssec just changes how much information they return.

Thanks, that’s a good tip. The TTL for A records hasn’t seemed to help.

Adding +cd changes the status from SERVFAIL to NOERROR for the Verisign case.

What is the switch to enforce DNSSEC for a server which doesn’t enforce it by default? (I couldn’t figure it out from dig -help). That would help me to further test ns01.is.nl and help show the provider what’s going wrong.

You have 3 servers so I would test for CAA records against all 3 of those nameservers rather than google and verisign

image

Let’s encrypt uses the name servers specified by your domain records.

Andrei

Good to have that confirmed Andrei. The problem is that so far I’ve been unable to trigger a SERVFAIL on those name servers, so I can’t prove to the provider that their DNS is misconfigured.

If you or anyone could show me the dig command(s) to trigger the SERVFAIL, it’s more likely that they will take me seriously.

hi @WouterTinus

@cpu and @jsha and @schoen should be able to look at logs from their side

there is also a tool which closely resembles how Let’s Encrypt resolves DNS queries

You can have a look at it here: https://unboundtest.com/

For some reasons your domains seem to return 404 which i think happens when the domain can’t be queried

@jsha should be able to help further

Andrei

To be clear the site I’m trying to renew is https://live.du.reports.menzis.facetbase.nl, but the error I get back from the ACME server is “SERVFAIL looking up CAA for reports.menzis.facetbase.nl”. So the actual certificate that I’m trying to renew is for a domain two levels deeper than the one the CAA fails for according to the ACME server.

That website you just linked shows errors requesting CAA records at every level of the domain except for the root, though most of them seem to be timeouts.

hi @WouterTinus

I am not sure how Let’s Encrypt deals with nested domains such as yours. I would wait until some of the people who have a better understanding chip in.

Andrei

There probably isn't any. dig is a relatively simple program. Authoritative DNS servers rarely return SERVFAIL directly. Usually, in a situation like this, it's synthesized by the recursive DNS server when it encounters one or more of a variety of error conditions, such as the domain's authoritative DNS servers being down, or responding with an error code, or responding with invalid data -- all of which Namebright is guilty of -- or DNSSEC that fails validation, and so forth.

In many of those cases, dig against the authoritative server will show some sort of valid-looking-ish response, or a different sort of error.

Edit: Oops, i thought this was the Namebright thread, not the is.nl thread. I'm sorry. Still, it doesn't change the rest of my response.

It looks for a CAA record, left to right, stopping when it gets one or when it reaches the public suffix.

E.g. if you validate www.example.com, it will do CAA queries for www.example.com. and example.com..

A deeply nested case like in this thread is exactly the same except with, uh, more queries.

(In Boulder's case, prioritizing speed over resource usage, it fires off each DNS query in parallel and sorts out the results afterwards, rather than going one at a time, but that doesn't really matter.)

1 Like

As @mnordhoff says, we check the FQDN, and each parent, from left to right. That means that if the nameserver for live.du.reports.menzis.facetbase.nl supports adding CAA records, you can just add a CAA record permitting issuance, and everything will work nicely. Read more here.

The same happens for me, when trying to renew the cert for m.auditcenter.hu. Also https://unboundtest.com/ shows an error message. All the other 160 domains for which I use Let’s Encrypt with the exact same configuration (from the exact same server) work correctly.