DNS-01 fails sporadically

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is:
superior.boulder.noaa.gov
I ran this command:

certbot certonly --manual --manual-auth-hook "/home/eugene/api/scripts/letsencrypt/update/auth.sh auth"
--manual-cleanup-hook "/home/eugene/api/scripts/letsencrypt/update/auth.sh cleanup" --preferred-challenges d
ns
-d superior.boulder.noaa.gov

It used to work %100, now it fails with CAA or DNS lookup issues. But it works, then it doesn't work.

The operating system my web server runs on is (include version):

Debian 9

certbot --version

certbot 0.28.0

I have a coworker and they see the same flakiness. They are probably RedHat, using the same script.

auth.sh is just a way to insert the record using nsupdate. If it makes a difference, we initially create a CNAME to a dynamic zone. We leave the CNAME but then we point it to a DNS TXT record.

So _acme-challenge.superior.boulder.noaa.gov CNAMES to _acme-challenge.superior.boulder.noaa.gov.letsencrypt.boulder.noaa.gov, a zone I control and has dynamic update.

I'll leave the last update TXT string, because it worked.

Noaa.gov is signed, my zone is not, if that makes a difference.

For the last few days, we get DNS timeouts on the TXT record or CAA lookup failures. Yet I do a TXT search at some world-wide DNS queries and it is there.

It just seems to be flakey.

1 Like

Hi @eugene.tsuno

your name servers are buggy - see https://check-your-website.server-daten.de/?q=superior.boulder.noaa.gov

Name servers without TCP connection, the infoblox1 name server: Ipv4 works, ipv6 looks like a blocking firewall.

And that

X Nameserver Timeout checking Echo Capitalization: infoblox1.boulder.noaa.gov / 2610:20:8800:c10::140
X Nameserver Timeout checking EDNS512: infoblox1.boulder.noaa.gov / 2610:20:8800:c10::140

may be critical. Both is used from the Unbound-version Letsencrypt uses.

So there are some reasons to create random errors.

5 Likes

Thanks, I found an ipv6 problem and I am fixing it now.

I have a nit with the test page. I think it assumes the mname in the SOA record is the primary NS and that isn't
what the original RFC 1035 states. I think now dynamic DNS is around, it is a practice (bad in my mind) to say that is where dynamic updates
are sent. The original intent is that to be where the records originate, which is especially true when the records are stealthed or generated elsewhere, which is common practice. There was a draft RFC to use MNAME for NS/dynamic updates but I don't think it was accepted.

Anyway, thanks for the help, I think we are one step farther to get things stable.

2 Likes

SOA records are basically ignored by the DNS validation process.
Your DNS needs to be fully functional.
IPv6 is generally preferred over IPv4 [when available].
There are three name servers shown as atuthoritative.
They return three IPv4 IPs and three IPv6 IPs.
But... two of the IPv6 addresses are the same.
So that's a two out of three (IPv6) chance of hitting that one system.
[not exactly all-eggs-in-one-basket but more than I would care to carry]

Name:      ns-e.noaa.gov
Addresses: 2610:20:8000:8c00::237
           140.90.33.237

Name:      ns-mw.noaa.gov
Addresses: 2610:20:8800:8c00::237
           140.172.17.237

Name:      ns-nw.noaa.gov
Addresses: 2610:20:8c00:8c00::2
           161.55.32.2

Have a look at this online DNS test result:
superior.boulder.noaa.gov | DNSViz

I think you missed an 8 vs. a 0 in the middle of the addresses!

2 Likes

And how obvious it is [now that you pointed it out!]

1 Like

Thanks. I think someone took "I'll fix that later" and it was never fixed! I don't control these NS servers but I can get that fixed.
Thanks again all responders.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.