Renewal problem: SERVFAIL looking up CAA

I’ve successfully created and installed certificates by running
sudo ./letsencrypt-auto --apache -d 020apps.nl -d static.020apps.nl -d olie.020apps.nl

I then ran this command:
sudo ./letsencrypt-auto renew --dry-run

It produced this output:

My operating system is (include version): Debian 8.7

My web server is (include version): Apache 2.4

I can login to a root shell on my machine: yes

I’m using a control panel to manage my site: no

Should I worry? I don’t understand why setting up the certificates went fine but renewing the ones for the subdomains seems to fail.
FWIW, the domain’s DNS record doesn’t have a CAA entry.

Hi @RonaldPK,

It’s fine not to have a CAA entry, but the DNS server needs to say “I don’t have a CAA entry” rather than “I don’t understand the question” or “I can’t answer the question”. The certificate authority views a DNS server failure (SERVFAIL) as invalid and refuses to issue a certificate in this case.

So, your DNS provider (or your own DNS server) needs to be updated to accept and respond to queries for the RR type CAA, even if the answer is that no such records are present.

Thank you @schoen . I still don’t understand why creating and installing the certificates worked (for all 3 domains) but renewing them fails (on 2 domains). Is the CAA not tested during cert creation?

It’s definitely tested each time; people have been getting failures related to this condition for over a year. Perhaps the DNS server software used somewhere in the chain has changed somehow?

Possible but unlikely…

I just tried for a different domain. Same server, same DNS registrar.

sudo ./letsencrypt-auto --apache -d www.example.nl
Certificate was created and got installed just fine, no errors.

sudo ./letsencrypt-auto renew --dry-run
The same SERVFAIL errors were reported for the 020apps.nl-subdomains and for www.example.nl

A change in DNS software seems even more unlikely :slight_smile:

Right now, the staging server has a stricter setting for CAA that rejects SERVFAILs for CAA, which is not yet rolled out in prod. Since --dry-run uses staging, that’s why you only get the error there.

Who’s your registrar? We’d like to work with them to get this fixed.

1 Like

If I’m to believe the SOA record, it’s Interned Services.

My personal testing with dig +dnssec olie.020apps.nl CAA and dig +trace +dnssec olie.020apps.nl CAA didn’t find any errors though…

DNSViz also doesn’t give any error but it doesn’t have the option to explicitely ask for CAA records (yet). (Issued two issues on the matter on their Github repo’s.)

Your query isn’t quite right. First, you need to find the authoritative nameservers:

$ dig +short NS olie.020apps.nl
$ dig +short NS 020apps.nl
ns03.is.nl.
ns01.is.nl.
ns02.is.nl.

Now you want to query one of the authoritative nameservers for CAA. Note that I use -t TYPE257 just in case your dig doesn’t support CAA yet:

$ dig -t TYPE257 olie.020apps.net @ns03.is.nl.

; <<>> DiG 9.10.3-P4-Ubuntu <<>> -t TYPE257 olie.020apps.net @ns03.is.nl.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 25275
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1680
;; QUESTION SECTION:
;olie.020apps.net.              IN      CAA

;; Query time: 157 msec
;; SERVER: 2001:9a0:2003:1::53:3#53(2001:9a0:2003:1::53:3)
;; WHEN: Fri Mar 24 15:36:51 PDT 2017
;; MSG SIZE  rcvd: 45

Note the REFUSED there, which is probably the problem.

Hmm, I see, will dig better next time :slight_smile:

But what I don’t understand is the following: when I executed dig with the +trace switch, it also queried one of the authorative nameservers (ns0{1,2,3}.is.nl). In the reply, it presented NSEC records, the (superseded by NSEC3 by the way :stuck_out_tongue:) “proof” of non-existence of the queried hostname/RR.
Those NSEC records are nowhere to be found when I directly query the nameserver like you did? (dig +dnssec +norecurse @ns03.is.nl olie.020apps.net CAA). Am I missing something? :worried: I do get the REFUSED tho… :slight_smile:

Trace mode essentially asks dig to act like a recursive resolver, optionally including DNSSEC record lookups and validation. Similarly, if you set the dnssec bit on a non-trace dig query, you are telling the resolver you want DNSSEC when it recurses. However, in this case you are talking directly to an authoritative resolver and it is not recursing for you:

;; WARNING: recursion requested but not available

@jsha Found out why you get a “REFUSED”: the TLD used by @RonaldPK is .nl, not .net :slight_smile:

It's olie.020apps.nl, not .net

Registrar is what Osiris said, Interned Services, https://www.internedservices.nl/ (or is.nl, but they're moving away that one, for obvious reasons)

Also dig +dnssec +norecurse @ns02.is.nl olie.020apps.nl CAA gives a NOERROR as status… So unfortunately still no clue why Boulder would result in a SERVFAIL.

Aha, thanks for spotting my error! The other possibility: Sometimes routing problems cause problems getting CAA from various network perspectives. I’ll try and dig into this, thanks.

FWIW, the registrar just told me they’re working on adding CAA as a resource type to the DNS management interface. No ETA, but good news anyway.

2 Likes

@jsha @schoen

is there a possibility of a bug here

also reporting similar conditions - dry run not working but normal process issuing

should we try to test by inducing the fail conditions?

i am little bit suspicious that there are not DNS record issues at the time of issuing but two days (or even the same day in the case above) there are lots of them

Andrei

@ahaw021: Looks like @roland commented on the other thread with a likely cause, so I’m guessing these are unrelated, but I appreciate you connecting the dots!

I’ve got an independent Unbound instance set up outside of prod that gets the same SERVFAIL results for your hostname on first query:

$ dig -t TYPE257 olie.020apps.nl @127.0.0.1 -p 1053

; <<>> DiG 9.10.3-P4-Ubuntu <<>> -t TYPE257 olie.020apps.nl @127.0.0.1 -p 1053
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31799
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;olie.020apps.nl.               IN      CAA

;; Query time: 1884 msec
;; SERVER: 127.0.0.1#1053(127.0.0.1)
;; WHEN: Mon Mar 27 14:03:00 EDT 2017
;; MSG SIZE  rcvd: 44

However, subsequent queries get NOERROR. I’m not sure what the problem is; any other thoughts here?

will spin this of as a new topic

will do some sleuthing see if we can replicate the error

Andrei

Maybe it's just a distance thing? Or some sort of regional connectivity issue? It looks like all 3 nameservers are hosted in 2 locations in 1 ASN in or near Amsterdam. And they use DNSSEC and relatively low TTLs. Maybe there's packet loss, or recursion is just taking too long and timing out?

Have the same problem with my DNS-Provider (inwx.de).

DNSSEC-enabled domains get SERVFAIL error when using validating resolver , domains without DNSSEC CAA works without error (with existing CAA-Record and without).

When will this strict check go into production? I have to disable DNSSEC on my domains where I need le-certs…