SERVFAIL looking up CAA for some host names but not others in the same zone

My domain is:

We're using cert-manager v1.11.1 to issue certificates via DNS-01 for a number of hosts in the domain, reissuing existing certificates and issuing new certificates works perfectly, but for a select few issuance fails with:

E0525 11:42:54.803799 1 sync.go:379] cert-manager/challenges/acceptChallenge "msg"="error waiting for authorization" "error"="acme: authorization error for 400 urn:ietf:params:acme:error:dns: DNS problem: SERVFAIL looking up CAA for - the domain's nameservers may be malfunctioning" "dnsName"="" "resource_kind"="Challenge" "resource_name"="" "resource_namespace"="asciinema" "resource_version"="v1" "type"="DNS-01"

The above error doesn't show up for other similar (and working) certificates of the same zone.

The CAA record is fine :

❯ dig @ -tCAA               3600    IN      CAA     0 issue ""

The certificate was first created on may 19 2022 and was issued and renewed several times until renewal stopped working about a month ago.

I can't figure out any difference between the certificates that work and those that don't.

I don't think anything has changed at our side.

Can anyone spare a clue?

You're looking at the CAA record for, but the error is for the full name (which it has to check first): SERVFAIL looking up CAA for

And that request doesn't work:

$ dig -tCAA

; <<>> DiG 9.16.38-RH <<>> -tCAA
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 13970
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

You don't need a CAA record for the full name, but if you don't have one the DNS server needs to correctly respond NOERROR (that there are no records) instead of giving an error.

For what it's worth, I see a SERVFAIL trying to request an A or AAAA record for the name as well.


Thank you, it seems I had a bad NS record in the zone, which caused recursors to fail.

I think I've fixed the problem by nuking the extra NS record, but I'm waiting for TTLs to expire.


That may not be necessary; As LE will only use the authoritative DNS servers.
That said, I would prefer that you "test" this out using the staging system [first].


As it happened it wasn't DNS caches that needed timing out, but simply the back-off of cert-manager itself that needed to time out and renew all the problematic certificates.

All's well again, thank you for reading the log file for me:)


This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.