Repeated SERVFAIL errors when Let's Encrypt looks up CAA for .net TLD

Hi everyone,

Since the September 9th we've been occasionally seeing an issue where Let’s Encrypt is failing to issue certificates for wildcard subdomain certs on deno.net because it can’t look up CAA records for the net. TLD itself. The failures come back as urn:ietf:params:acme:error:dns with messages like:

DNS problem: SERVFAIL looking up CAA for net – the domain's nameservers may be malfunctioning

These errors happen before Let’s Encrypt even checks our authoritative nameservers — it appears to be failing at the TLD-level CAA lookup.

This in itself would be fine if the error returned was somehow marked as retryable - but it is not, so we don't know to retry this error, and instead surface it as a configuration error that is not retryable.

So two questions:

  1. Why did this lookup start failing? We first saw the error on September 9th, but we've been issuing certificates with the exact same setup since the start of 2025. We had already issued tens of thousands of certs without ever seeing this before September 9th.
  2. Could you implement retries on the TLD level lookup in boulder, and if that is already the case and this is still not enough, can you return an error code on the order that suggests retrying the order may help?

I've attached all occurrences below (together with approximate timestamps):

2025-09-10 12:31:32+00: urn:ietf:params:acme:error:dns: While processing CAA for *.sabo28.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-09-15 01:10:41+00: urn:ietf:params:acme:error:dns: While processing CAA for *.lokou14.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-10-16 13:23:30+00: urn:ietf:params:acme:error:dns: While processing CAA for *.he.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-10-19 21:20:41+00: urn:ietf:params:acme:error:dns: While processing CAA for *.mytechsupport.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-10-21 00:10:02+00: urn:ietf:params:acme:error:dns: While processing CAA for *.fffngzzj.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-10-23 17:57:32+00: urn:ietf:params:acme:error:dns: While processing CAA for *.bastidood.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-10-29 06:26:49+00: urn:ietf:params:acme:error:dns: While processing CAA for *.seri-f.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-11-08 13:46:25+00: urn:ietf:params:acme:error:dns: While processing CAA for *.cufeyyc.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-11-09 10:55:39+00: urn:ietf:params:acme:error:dns: While processing CAA for *.anhnv02.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-11-10 08:57:42+00: urn:ietf:params:acme:error:dns: While processing CAA for *.0x76.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-11-14 18:58:10+00: urn:ietf:params:acme:error:dns: While processing CAA for *.1677bb77f047.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-11-14 21:20:04+00: urn:ietf:params:acme:error:dns: While processing CAA for *.1d49479472ca.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

Thanks!

1 Like

I think it is likely a problem with a missing glue record for deno.net configuration

See: sabo28.deno.net | DNSViz

net to deno.net: Authoritative AAAA records exist for ns-907.awsdns-49.net, but there are no corresponding AAAA glue records. See RFC 1034, Sec. 4.2.2.

4 Likes

I don't think that is the problem here - the warning is about IPv6 glue records, which in practice do not matter because LE is not IPv6 only, but instead does DNS lookups over IPv4 as necessary (and demonstrated by the fact that this does work most of the time).

Also, this is the standard and only supported configuration of AWS Route53, so I do not think this can be related because it would mean this would be happening to everyone on Route53?

Also, going by the error message here, this lookup failure does not even involve deno.net. It is a CAA lookup on net. itself, no?

No, it is not standard on Route53 to be missing glue records. I use Route53 myself. Not often but we have seen cases before where the DNS records get in a bad state. Not sure if AWS changes the DNS servers for some reason or whether people have copied the wrong ones.

We refer people to this article starting at step 4. Don't be misled by the title. Making Route 53 the DNS service for an inactive domain - Amazon Route 53

Generally, when there is a fault in the DNS config and you get problems related to DNS queries the first best step is to correct your DNS config.

As to some of your other comments ...
The error message about .net CAA check is odd but Let's Encrypt walks the DNS tree. Problems in that tree can manifest in peculiar ways.

As for saying the missing record must be ok since sometimes it works isn't a fair conclusion either. LE chooses a path in your tree and every path must lead to correct conclusion. LE does IPv4 fallback when sending HTTP challenge requests, the IPv6/v4 choices for DNS queries isn't published.

If .net was failing regularly there would be vast numbers of reported failures and there is not. Similarly, if there was general Route53 problems we would also see vast failures. LE issues over 7 million certs per day. General problems with a commonly used TLD and/or Route53 would cause numerous problems as you might imagine. Yours is the only one reported and it has been 7 days since you reported.

2 Likes

@MikeMcQ I hear you, but we have not made any infrastructure changes in the last 6 months on this system, and yet the CAA lookup for net. (not for deno.net.) started failing on September 10th. This is important, the error is not while looking up the CAA for deno.net, but while doing the CAA lookup for net itself. This lookup is expected, as not just the CAA on deno.net has to allow LE, but also all parent domains do too. It is however not within our control?

This glue record issue also does not seem fixable by us - our NS servers are set up correctly, and deno.net has no glue records (which is correct). The link that you sent also does not provide any further details than "add the NS records to your registrar". The fact that AWS does not have glue records on their domain seems odd, I agree, but I don't see how it could be related here. The glue records should not be consulted here anyway, because the name servers that deno.net points to are not subdomains of deno.net itself, but rather are on other domains. As such no glue records are needed, because no name recursive lookup takes place (glue records would only be needed if deno.com's NS would point to something like ns1.deno.net), right?

If you think I am mistaken, please let me know. But the fact that this started failing without our intervention, randomly, one day, seems odd. We issue hundreds of certs daily, and have done so before September 10th, but it only started occasionally failing on that day.

I appreciate your patience here.

Sure, I can see how it would. Clearly then something has changed. LE uses unbound for DNS queries. Perhaps something in that changed. Unbound is complex. Doesn't mean it is wrong now but perhaps it is not working for odd cases where it worked before.

It is worth checking with your registrar or AWS about the missing glue records. Perhaps there are other problems with the DNS config and that is only one symptom. It is not normal to see missing glue for Route53.

You said you only started seeing occasional problems about 2.5 months ago. Are these problems all with this same domain? If not, what other domains exhibit failure for the same CAA query problem? Do they fail for some other DNS query problem? Are they all for domains with .net tld?

I understand the other parts of what you say. But, my experience tells me that if you have a DNS query problem the first thing to fix is any config problem in the DNS tree. If nothing else it eliminates a potential cause.

Perhaps some other volunteers with deeper DNS and/or unbound expertise will offer advice.

2 Likes

The problems are all with subdomains of deno.net. Do note though, that over 90% of certificates that we manage are wildcard certificates for <slug>.deno.net subdomains, so the fact that these are failing (rather than other domains), may just be luck rather than correlation.

They all fail with the same SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning.

We'll look more into the glue record thing.

If anyone else has ideas, particularly someone who works at LE and could run a query to see how many orders fail with SERVFAIL looking up CAA for net errors, that'd be super helpful I think.

if one have a CAA record in deno.net LE won't climb to .net to find CAA record. you could try that

I am not sure that is true. CAA queries are done in parallel. See: Problems with CAA records only with Google and Let's Encrypt - #14 by aarongable

It is not clear to me whether an error closer to the TLD would be reported if there was a valid CAA record closer to the domain name. I am not proficient in Boulder :slight_smile:

@lucacasonato What is your failure rate for that domain? Are we talking like 50% or 1%?

3 Likes

The failure rate between 2025-09-10 12:53:26.845864+00 (first occurance) and 2025-11-25 13:17:23.415427+00 (most recent issued certificate), is ~0.084% of all certificates requested in that period by us and issued. It's ~0.09% of orders involving deno.net.

In addition to the 12 failures I posted in the original post, 4 additional failures have occurred since:

2025-11-19 16:42:20+00: urn:ietf:params:acme:error:dns: While processing CAA for *.digitoolmedia.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-11-23 19:57:58+00: urn:ietf:params:acme:error:dns: While processing CAA for *.purpleee.deno.net: DNS problem: SERVFAIL looking up CAA for purpleee.deno.net - the domain's nameservers may be malfunctioning

2025-11-24 01:14:01+00: urn:ietf:params:acme:error:dns: While processing CAA for *.luanmm.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

2025-11-24 04:05:04+00: urn:ietf:params:acme:error:dns: While processing CAA for *.israelsantander77.deno.net: DNS problem: SERVFAIL looking up CAA for net - the domain's nameservers may be malfunctioning

Then that's a bad design decission and should be changed. It's creates unnecessary traffic, load and problems. There's just no need to process higher-level CAAs if one exists further down the hierarchy.

It has been that way for over 8 years as noted in the post I linked. Please start a new thread about that if you wish to discuss it further. Although, you'll find the comments in the Boulder code helpful for further background.

Discussing that here is not helpful and will only clutter responses for this person's issues.

Thanks

2 Likes

All CAA lookups (e.g. for sabo28.deno.net, deno.net, and net) are launched in parallel for the sake of latency. However, the results of those later (closer to the DNS root) queries only matter if the earlier queries return no CAA records.

Does deno.net have CAA records? If yes, that points to a misconfiguration at deno.net (e.g. missing glue records) causing us to sometimes not retrieve those records.

3 Likes

No, deno.net has no CAA records:

~ ❯❯❯ dig +trace -t CAA deno.net

; <<>> DiG 9.18.39-0ubuntu0.22.04.2-Ubuntu <<>> +trace -t CAA deno.net
;; global options: +cmd
.			511529	IN	NS	a.root-servers.net.
.			511529	IN	NS	b.root-servers.net.
.			511529	IN	NS	c.root-servers.net.
.			511529	IN	NS	d.root-servers.net.
.			511529	IN	NS	e.root-servers.net.
.			511529	IN	NS	f.root-servers.net.
.			511529	IN	NS	g.root-servers.net.
.			511529	IN	NS	h.root-servers.net.
.			511529	IN	NS	i.root-servers.net.
.			511529	IN	NS	j.root-servers.net.
.			511529	IN	NS	k.root-servers.net.
.			511529	IN	NS	l.root-servers.net.
.			511529	IN	NS	m.root-servers.net.
;; Received 239 bytes from 127.0.0.53#53(127.0.0.53) in 4 ms

net.			172800	IN	NS	c.gtld-servers.net.
net.			172800	IN	NS	j.gtld-servers.net.
net.			172800	IN	NS	a.gtld-servers.net.
net.			172800	IN	NS	b.gtld-servers.net.
net.			172800	IN	NS	g.gtld-servers.net.
net.			172800	IN	NS	f.gtld-servers.net.
net.			172800	IN	NS	l.gtld-servers.net.
net.			172800	IN	NS	d.gtld-servers.net.
net.			172800	IN	NS	h.gtld-servers.net.
net.			172800	IN	NS	i.gtld-servers.net.
net.			172800	IN	NS	m.gtld-servers.net.
net.			172800	IN	NS	e.gtld-servers.net.
net.			172800	IN	NS	k.gtld-servers.net.
net.			86400	IN	DS	37331 13 2 2F0BEC2D6F79DFBD1D08FD21A3AF92D0E39A4B9EF1E3F4111FFF2824 90DA453B
net.			86400	IN	RRSIG	DS 8 1 86400 20251208050000 20251125040000 61809 . YBmr2XaGjxxuL04+l4tAOgeqerNg43PdJfPmzpivDqjm1nPx6esC/0Ym P6bMCK2pPSN/uS4Y/rTQ2hmDeIEoELk+gi5Siy6Kd7ma9XOaybD8NjT6 IvPL36X8q2CqORmmc7lqixzN9r8qFo0G86U/4whepuzXuJbj39YzZHo9 ECag4JQHG9GSIQvq3AiTw0/5jm9yYgytuejbr8KDUccklywFY+wHGUQQ 0alCz+mVIPSpQBnxu1qeY5KsNWgFoom5X56ojctckVY2fOsrmc5HP+cw aOYOIemNK/l99e0U5pYaH4vUSgIz+0LyZLr3ZKSrPqJcfsZpjGKh7GZk 1SPfMQ==
;; Received 1196 bytes from 192.33.4.12#53(c.root-servers.net) in 91 ms

deno.net.		172800	IN	NS	ns-275.awsdns-34.com.
deno.net.		172800	IN	NS	ns-907.awsdns-49.net.
deno.net.		172800	IN	NS	ns-1261.awsdns-29.org.
deno.net.		172800	IN	NS	ns-1890.awsdns-44.co.uk.
A1RT98BS5QGC9NFI51S9HCI47ULJG6JH.net. 900 IN NSEC3 1 1 0 - A1RTLNPGULOGN7B9A62SHJE1U3TTP8DR NS SOA RRSIG DNSKEY NSEC3PARAM
A1RT98BS5QGC9NFI51S9HCI47ULJG6JH.net. 900 IN RRSIG NSEC3 13 2 900 20251201032134 20251124021134 17133 net. Y+xk/CdB62ERHR7uEO/mkM626n7DFSFstE3K+aIotRpkI2moS2ScFYJj R1X1wmHDtjjRXqs92UqxdCTnN0Mwpg==
0EEOBNRMFK0BBHOITQ0CPIM6CMKD0GG8.net. 900 IN NSEC3 1 1 0 - 0EESU1H20JUHDCH2JS44H2HR4HMKFERG NS DS RRSIG
0EEOBNRMFK0BBHOITQ0CPIM6CMKD0GG8.net. 900 IN RRSIG NSEC3 13 2 900 20251201031759 20251124020759 17133 net. V3fGPJZBSO90cLYbqZu52y3YsM2TwUJHngPOacsaJYXpMeFCpz9HRiT2 s1ePwPj8e2HTjBgp4qTG/YmrLaL0vA==
;; Received 547 bytes from 2001:502:8cc::30#53(h.gtld-servers.net) in 110 ms

deno.net.		900	IN	SOA	ns-1890.awsdns-44.co.uk. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
;; Received 124 bytes from 2600:9000:5304:ed00::1#53(ns-1261.awsdns-29.org) in 21 ms
1 Like

Ok, then it's likely that the issue really does lie with the .net root nameservers. We'll look into our logs and see if we can confirm that other domains under net are having this same problem.

However, you can mitigate this problem by adding a CAA record at deno.net, so we never have to care whether the net lookup is successful.

6 Likes

@aarongable Just circling back on this - did you find anything in the logs?

Thanks for the nudge.

Yes, we do see this regularly with CAA lookups for .net (and for .com, which is operated by the same authoritative name servers) across many registered domains, not just yours. We see it at somewhat small rates: about 0.04% of our CAA checks for .net result in SERVFAIL. That said, this error rate is much higher than the error rate for .org, which is about 0.002%.

Most clients recover by simply retrying the ACME issuance flow the next time the client wakes up, usually an hour or so later.

5 Likes

Thanks, that’s very helpful. Our acme client retires certain errors, specifically those with a Retry-After. We don’t configure our client to retry this error automatically, because it (generally) points to a user DNS configuration problem.

Ideally LE would retry lookup from the root servers by itself a few times, knowing that the error rate is proportionally so high. That would likely incur the least additional load on LE, because we wouldn’t have to create new orders, etc.

Alternatively, if this specific error (lookup failures on the root name server) returned a different error type or a Retry-After header from LE, we could auto retry it.

Or do you think we should do matching on the error message to try to determine whether the error was this specific tld nameserver error?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.