Dns query timeout but dns is up and working

My domain is:

I ran this command:
Default DA LE cert generation

It produced this output:
2021/01/11 11:17:29 [INFO] [vechtdal.innobrix.nl] acme: Obtaining SAN certificate
2021/01/11 11:17:30 [INFO] [vechtdal.innobrix.nl] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/9991882950
2021/01/11 11:17:30 [INFO] [vechtdal.innobrix.nl] acme: Could not find solver for: tls-alpn-01
2021/01/11 11:17:30 [INFO] [vechtdal.innobrix.nl] acme: use http-01 solver
2021/01/11 11:17:30 [INFO] [vechtdal.innobrix.nl] acme: Trying to solve HTTP-01
2021/01/11 11:18:14 [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/9991882950
2021/01/11 11:18:14 [INFO] Unable to deactivate the authorization: https://acme-v02.api.letsencrypt.org/acme/authz-v3/9991882950
2021/01/11 11:18:14 Could not obtain certificates:
error: one or more domains had a problem:
[vechtdal.innobrix.nl] acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: query timed out looking up A for vechtdal.innobrix.nl, url:
Certificate generation failed.

My web server is (include version):
nginx 1.19.5

The operating system my web server runs on is (include version):
CentOS 7

My hosting provider, if applicable, is:
Transip VPS / Hostnet DNS

I can login to a root shell on my machine:

I'm using a control panel to manage my site:
Directadmin 1.61.5

The version of my client is:
2.0.11 (DA LE script)

It keeps throwing dns query timeouts since the domain has been created last friday.
I've verified the DNS is working properly, using mxtoolbox and some other dns test tools.

So I'm not sure what is wrong, or how to fix it so any help would be apreciated.

Thank you in advance,

1 Like

Running DNSViz on your hostname/domain there seems to be just one error: your DNS servers aren't reachable over TCP.

If we use Unboundtest.com to test your hostname we see indeed a timeout when tries to resolve the RR by TCP. Unbound is the resolver library used by the Boulder software used by Let's Encrypt and unboundtest.com is configured in such a way to mimic the Boulder configuration. It's run by @jsha, one of the Let's Encrypt staff members.

I'm not sure if TCP connectivity is mandatory by DNS RFCs, but it seems it is mandatory for Let's Encrypt. Not sure why though.

Strangely enough sometimes I get an error and sometimes (after waiting a few seconds), it does work:

osiris@erazer ~ $ dig @ns2.hostnetbv.com. +tcp +norecurse vechtdal.innobrix.nl. A
;; communications error to 2a02:2268:ffff:ffff::2#53: connection reset

osiris@erazer ~ $ dig @ns2.hostnetbv.com. +tcp +norecurse vechtdal.innobrix.nl. A
;; communications error to connection reset

osiris@erazer ~ $ dig @ns2.hostnetbv.com. +tcp +norecurse vechtdal.innobrix.nl. A

; <<>> DiG 9.16.6 <<>> @ns2.hostnetbv.com. +tcp +norecurse vechtdal.innobrix.nl. A
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24316
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

; EDNS: version: 0, flags:; udp: 1680
;vechtdal.innobrix.nl.		IN	A

vechtdal.innobrix.nl.	14400	IN	A

;; Query time: 4834 msec
;; WHEN: Mon Jan 11 11:53:49 CET 2021
;; MSG SIZE  rcvd: 65

osiris@erazer ~ $ 

Strange behaviour of those DNS servers..


I can only think of two reasons to require DNS via TCP:

  1. Increased security.
  2. Handle larger packet sizes.

But I have not come across why LE would require it.
And the docs don't mention TCP nor UDP: https://letsencrypt.org/docs/challenge-types/


Yes it's not the TCP, it falls back to TCP because of UDP giving a timeout I think, seems to happen at random moments too.

Perhaps it's a connectivity issue? Especially because the timeouts are random, and not always at the same point.

PS: I've got about 20 working subdomains, all using the same DNS servers.

Do you run your jobs at very busy times - like top of the hour?

Is there an IPS involved?

1 Like

I just tested with another domain on the same DNS servers and that seems to work, call me lost.

This was at 10:00 CET today, and friday at 18:00 so the time slots where very diffrent.

I don't think so, the difference I have found though is DNSSEC which is on for innobrix.nl and off for the other domain. (I don't have access to the DNS itself)

Hi @SanderWD

that's fatal. Never run such jobs at 10:00, 18:00. Add some minutes.


I run them by hand, not exactly 10:00 of course and I've done about 10 attempts in total.

Edit: I seem to have "fixed" it, restarted nginx and that seemed to have solved it :sweat_smile:
Still the error made no sense if this was the issue.

TCP is mandatory, per https://tools.ietf.org/html/rfc7766#section-1:

As a practical matter, Let's Encrypt's current Unbound configuration (as reflected by unboundtest.com) will try UDP first, and will do TCP fallback if the UDP response has the TC (truncated) bit set.

Our Unbound configuration also sets a particularly low edns-buffer-size: 512. This is a measure against IP fragmentation attacks, and means that large responses are much more likely to trigger TCP fallback that with other DNS servers. In particular, DNSSEC is likely to trigger TCP fallback because it results in large response sizes.

Glad you've resolved it! I'm pretty sure the restart of Nginx was unrelated. I just tested with https://unboundtest.com/m/A/vechtdal.innobrix.nl/QOI6KBCF and found that unboundtest now successfully resolves your domain. Most likely one of a few things happened:

  1. Your DNS zone changed such that responses are now smaller than 512 bytes.
  2. Your authoritative DNS servers started answering TCP traffic on port 53.
  3. Our DNS server was overwhelmed at the time of your previous queries and is no longer overwhelmed.

Sounds more like random luck/coincidence than a permanent fix.
NGINX changes can't fix DNS issues.