Hi team, we’re seeing issues and we’re not sure if it’s related to the incident or not. The status page says “monitoring” and that the root cause has been fixed.
LE is giving us:
403 urn:acme:error:caa: Error creating new cert :: Rechecking CAA: While processing CAA for brightgen.com:
DNS problem: SERVFAIL looking up CAA for brightgen.com
We’re seeing a very high error rate trying to issue certs and wonder if there is still perhaps some fallout from the DNS issue? Could unbound require a restart or something like that?
@jsha hope you don’t mind the mention on this one, we’re running into it a lot and it doesn’t appear that the status page has any info about this specific issue.
Jacob is on vacation. I’m not sure the Let’s Encrypt server logs are very helpful with these kinds of issues, so I would suggest that you let our resident DNS guru @mnordhoff have a look first. He has a knack for figuring out wacky DNS issues like these.
There have been an oddly large number of reports of DNS issues today.
One of them was due to a misconfiguration with the domain.
With another, 1 of 2 nameservers used by the domain was partly broken due to a misconfiguration, but the domain ought to have worked anyway thanks to the other nameserver.
Most of the domains seem to have nothing obviously wrong.
I’m wondering if Let’s Encrypt really is having an issue. Maybe a routing issue affecting a small percentage of traffic.
The letsencrypt.org issue on the status page was with the authoritative DNS. It ought to be more or less impossible for it to have had any impact on the resolvers, but strange things can happen when you have a severe outage.
Still, Let’s Encrypt won’t make SOA queries, and shouldn’t be using TCP often. If those are the only issues with those domains, they should be harmless. (Edit: Harmless to Let’s Encrypt’s resolver. They’re still bad, and need to be fixed, in general.)
The www.aclu-nca.org issues are mostly the TLD and ordinary Amazon Route 53 stuff.
This could potentially be an IPv4 vs. IPv6 problem; the www subdomain has an IPv6 entry but the base domain doesn't, and the IPv6 www subdomain and the IPv4 base domain return different content in HTTP.
The last couple of days seem to have introduced some invisible hoop that nameservers need to jump through but nobody can identify
That said, OP did post about this a few weeks ago, so it could just be their specific nameservers still suffering from the same problems they did in the past.
It's unlikely because we don't control our customer's nameservers or DNS settings, and there are a wide range of domains failing on completely different servers.
403 urn:acme:error:caa: Error creating new cert :: Rechecking CAA: While processing CAA for awhitepondparadise.com: DNS problem: query timed out looking up CAA for awhitepondparadise.com, While processing CAA for www.awhitepondparadise.com: DNS problem: query timed out looking up CAA for
This time we let our service continue attempting to issue a cert for it.
After about 90 retries, LE was finally able to resolve the domain correctly.
@marktheunissen@kf6nux I went through all of the domain names shared in this thread. The majority of them all displayed problems with 0x20 case randomization. Further, there is overlap with the authoritative nameservers in use by the problematic domains. The only one I’m currently stumped by is www.aclu-nca.org.
All of the following use meganameservers.eu for their authoritative DNS and fail to handle 0x20 randomization properly: