Let's Encrypt's DNS resolution process

Hey there!

Would it be possible to get a detailed explanation of how Let's Encrypt resolves hostnames via DNS?

We have to pre-verify certificate authorizations before asking Let's Encrypt to verify them. Else we'd run into rate limits very fast (and we did, at first). So essentially we have to "emulate" how Let's Encrypt resolves hostnames.

Right now we do the following: we use both the SOA and NS of a hostname to resolve a hostname. This works in most cases, but sometimes it doesn't and yet Let's Encrypt is still able to verify an authorization if I manually push it through. This led me to believe LE is doing something differently.

For example, we had one failing our pre-verification check this morning because the SOA timed out.

Do you use the SOA to resolve the hostname? If so, do you have a timeout setup? If this fails, do you rely on the domain's NS only?

This would help us relieve some pain for our users.

1 Like

Have you seen https://unboundtest.com ? It's an unofficial system put up by ISRG staff. If you search "unbound" in this forum, you'll see a few threads that should answer your question.

5 Likes

I don't believe they look at the SOA at all to find the authoritative servers, it just uses the NS records as delegated from the DNS root.

If you want the really detailed specifics, I think what you'd need to do is to look at Unbound and at the Boulder source code.

7 Likes

Worth stating the obvious that all of your NS for the domain have to give a valid response, not just one. So if you are writing changes to DNS before validation you need to ensure all NS have the same response (and they can all reply to a CAA record query - so no NXDOMAIN and no SERVFAILs will be tolerated during validation).

Yesterday one of my users who had a domain with Google Cloud DNS was returning a SERVFAIL response on the CAA record check, which was presumably a transient failure behind the scenes at Google, so it seems everyone is capable of getting this stuff wrong.

5 Likes

I switched our logic to only check NS (and not SOA anymore).

It's working fine for us, however there are still issues from time to time.

This morning, our pre-verification process failed because we couldn't resolve the NS servers to IPs. We recursively resolve them from the root servers.

  • ns1.theserver.com.au
  • ns2.theserver.com.au
  • ns3.theserver.com.au

None of our attempts returned any record for these hostnames from our servers. Same issue from my local computer (using different DNS servers).

I still tried to push through the authoritzation via Let's Encrypt. It did work. This is puzzling me.

I can resolve the hostname (appeal.the9livesproject.org) to an IP, no problem. Is that all we should be testing? Perhaps in addition to CAA records...

What shows?:

nslookup ns1.theserver.com.au r.au
nslookup ns2.theserver.com.au s.au
nslookup ns3.theserver.com.au t.au
5 Likes

There will always be transient problems when comms are involved. Taking a cue from the Let's Debug test site, you could check

  1. Resolve hostname IP to A and/or AAAA
  2. Connect to http://(domain)/.well-known/acme-challenge/YourSpecialToken
    You should expect http 404. You could warn about any other http codes but still try cert request. If request times out then maybe not even try cert request. I assume you are doing http challenges. Note LE Servers will connect using IPv6 if AAAA record in hostname DNS else IPv4 if just A. This check is most likely to have transient errors.
  3. If find CAA record check valid value

Let's Debug does more than that but its purpose is different. You might consider checking Let's Encrypt Server operational status like it does too.

Personally, I would only check the most likely items causing problems. You are more familiar with your clients than I am so know those best.

5 Likes