Certificate not getting issued due to DNS lookup timeout on A and AAAA records

Domain Name/Common Name: www.10thmarines.marines.mil
Certificate Type: DV SAN

There are totally 91 SANs in this certificate. Certificate is not getting issued due to DNS lookup timeout when LE queries for A and AAAA records for all the SANs in this certificate.

For example,

However, if you look at the verbose information it says that the HTTP A and AAAA records were found.

image

So when the A and AAAA records were found, why do we see the error saying that the query timed out looking up A and AAAA records?

Why is the certificate not getting issued? What can be done to get the certificate issued?
Thanks for your help.

Because

And all of them have to resolve, at the same time.

Is there a possibility Let's Encrypt is inadvertently DoSsing your authoritative DNS? Does it have some kind of rate limiter?

5 Likes

Resolving your hostnames was VERY slowly. There are also some errors and some warnings:

https://dnsviz.net/d/www.futures.marines.mil/dnssec/

Not sure if the errors are actually detrimental, but especially the slowlyness was very clearly. Maybe you can check it yourself by re-analyzing the hostname using DNSViz.

My guess is the Let's Encrypt resolvers are simply timing out.

3 Likes

Do you need 91 names on a single cert?
If not, split that up into three [or more] certs - and things may work better.

3 Likes

Thanks for the response. However, I still don't understand.

Lets consider the domain www.futures.marines.mil. letsdebug.net says that there was a timeout in getting A and AAAA records. But also under verbose logging it says that the A and AAAA records were found. What does this mean? Did the LE servers timeout or did it successfully receive the A and AAAA records?

1 Like


Because those are two separate perspectives:

  • Let's Debug = finds IPs
  • LE staging = can't find IPs
2 Likes

Looking at the DNS responses, I see that the DNS response at every level was within a few ms. I don't see how you say it was VERY slow.

https://dnsviz.net/d/www.futures.marines.mil/responses/

Also checking further regarding the error reported there when looking up CNAME record for edgekey.net, that DNS server doesn't error and provides a proper response.

[anokulka@lsg-gss8:~]$ dig AAAA a7-64.akam.net

; <<>> DiG 9.16.1-Ubuntu <<>> AAAA a7-64.akam.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29351
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;a7-64.akam.net. IN AAAA

;; ANSWER SECTION:
a7-64.akam.net. 89341 IN AAAA 2600:1406:32::40

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 22 22:29:15 UTC 2023
;; MSG SIZE rcvd: 71

[anokulka@lsg-gss8:~]$ dig CNAME www.mcpw.marines.mil.edgekey.net @a7-64.akam.net.

; <<>> DiG 9.16.1-Ubuntu <<>> CNAME www.mcpw.marines.mil.edgekey.net @a7-64.akam.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59689
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.mcpw.marines.mil.edgekey.net. IN CNAME

;; ANSWER SECTION:
www.mcpw.marines.mil.edgekey.net. 300 IN CNAME e11291.dscb.akamaiedge.net.

;; Query time: 1 msec
;; SERVER: 2600:1406:32::40#53(2600:1406:32::40)
;; WHEN: Wed Nov 22 22:26:14 UTC 2023
;; MSG SIZE rcvd: 98

How long do LE servers wait before timing out? And like I asked above, if they are actually timing out do we know why see a the A and AAAA records under the verbose logging on letsdebug.net?

Is there a way to get more details on exactly which nameserver was slow to respond which caused LE to timeout?

Apparently same issue is happening with another certificate which has only 2 SANs.

So the certificate is not getting issued because LE staging cannot find the IPs?

Staging only issues TEST certs.
What that shows us is that LE systems can't get DNS resolutions from your authoritative DNS servers.
LE Staging and Production systems are generally on the same source network(s).

3 Likes

Yes, and moreso a timeout is quite unusual (and hard to debug).

Have you tried unboundtest.com?

4 Likes

Again, this is a perspective issue.
From a very close location, thing(s) appear fast.
But from some other point on the Internet, the exact same thing(s) may be very slow.

3 Likes

So can we get more details on which DNS server is actually failing to respond? I don't see that in the verbose logging.

That is the closest you will get to that.

This is a free service and there is no dedicated support for such troubleshooting/questions.

3 Likes

Then the problem is bigger than I expected.

2 Likes

I have tried unboundtest.com. And the DNS resolutions succeeds just fine. Don't see anything timing out.

https://unboundtest.com/m/A/www.futures.marines.mil/RFMXK2CP
https://unboundtest.com/m/AAAA/www.futures.marines.mil/RGORBNY4
https://unboundtest.com/m/A/www.hqmc.marines.mil/SYIH6BCP
https://unboundtest.com/m/AAAA/www.hqmc.marines.mil/NDKQOHNW
https://unboundtest.com/m/A/www.iandl.marines.mil/QMKWXQTM
https://unboundtest.com/m/AAAA/www.iandl.marines.mil/GNB4NJGS

Have you looked at?:

And this may have something to do with the problem:

www.futures.marines.mil          canonical name = www.mcpw.marines.mil.edgekey.net
www.mcpw.marines.mil.edgekey.net canonical name = e11291.dscb.akamaiedge.net

[meaning: the problem may be within systems outside your control]

2 Likes

Not sure what you mean. Can you please elaborate?
Those are just CNAME records and eventually we get an A record.

I have tested each nameservers along the way (even the edgekey.net and akamaiedge.net nameservers) and not able to see any timeouts.

If we can get a dig +trace result on the test which LE is doing on letsdebug.net, that will tell us which Nameserver is failing to respond or responding slowly. Is there a way to get that?

I think your problem is the same as mine, which I detailed here: DNS timeout from Let's Encrypt servers - #8 by kenh1

(The domain in my posting is not in .mil, but the nameservers for it ARE in .mil). I also tried a lot of testing along the way (even using a collection of RIPE Atlas probes) and I could not reproduce the timeouts that were reported by Let's Encrypt, either on staging or production. I think recently something changed and Let's Encrypt can not resolve anything under the .mil TLD, which ... kind of sucks? I am not sure there is a way forward here without getting some more verbose output from the DNS resolver stack Let's Encrypt is using.

4 Likes

Issue resolved. More details in this post

3 Likes