I work on a small domain registrar/site hosting company. We have been using Let's Encrypt for our domains for about 2 years now but since about 6 days our certificate renewal processes have started failing with CAA SERVFAIL issues:
Problem {
type: "urn:ietf:params:acme:error:dns",
detail: "DNS problem: SERVFAIL looking up CAA for www.itsjustnic.com - the domain's nameservers may be malfunctioning",
status: 400,
}
However, I can't reproduce this result:
djc-2021 instagram-owner certifier $ dig caa itsjustnic.com
; <<>> DiG 9.10.6 <<>> caa itsjustnic.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49935
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4095
;; QUESTION SECTION:
;itsjustnic.com. IN CAA
;; ANSWER SECTION:
itsjustnic.com. 3600 IN CAA 0 issue "letsencrypt.org"
itsjustnic.com. 3600 IN CAA 0 iodef "mailto:dns@instantdomains.com"
;; Query time: 104 msec
;; SERVER: 100.100.100.100#53(100.100.100.100)
;; WHEN: Mon Dec 18 16:07:31 CET 2023
;; MSG SIZE rcvd: 125
djc-2021 instagram-owner certifier $ dig caa www.itsjustnic.com
; <<>> DiG 9.10.6 <<>> caa www.itsjustnic.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57555
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4095
;; QUESTION SECTION:
;www.itsjustnic.com. IN CAA
;; Query time: 101 msec
;; SERVER: 100.100.100.100#53(100.100.100.100)
;; WHEN: Mon Dec 18 16:07:39 CET 2023
;; MSG SIZE rcvd: 47
All of the authorization errors that have seen have mentioned a www.*
domain (we always create CSRs with two domains in the SAN, the registrable domain and the www.
for that registrable).
I did find the documentation on CAA errors and common causes, however, I don't think these apply here? We do nothing with DNSSEC, and as shown above, we yield NOERROR
for domains that don't have a CAA record.
We are using instant-acme (which we wrote) as our ACME client, and are using the dns-01 authorization method. Authoritative DNS records are served using a simple DNS server which we also wrote. However, our cloud logging solution does not reveal any error logs from the DNS server, nor any other logs with a SERVFAIL status (although we log ~every response).
This issue seems to have been occurring since about Dec 12th; we have about 100 sites right now that have failed to renew and 2 new domains that we haven't been able to get a certificate for. As far as I'm aware there have been no material changes to our DNS server or certificate issuance component. I'd appreciate any help on this issue!