Query timeout with DNSSEC enabled

I originally posted this here but it turns out, I have a different problem.

Since two days, i cannot create or renew any certificates for niyawe.de. (DNS problem: query timed out looking up TXT for _acme-challenge.aaaaaa.niyawe.de) While I can create new certificates for niyawe.tk.
I use dehydrated with a dns-hook-script and the same nameservers for both domains.

The only difference is, that niyawe.de is correctly signed using DNSSEC.

My hosting provider is Hetzner. Since this is a common question: I am also not blocking any IPs.

The last successful renew for niyawe.de was on 2020-04-29. I haven’t changed anything on my side since then.

Is there any chance you can post another pcap, not filtered by host if possible?

In your previous one, there’s only two peers - your server and one AWS validation server. There should be 4 validation servers making an appearance (3 from AWS and 1 from Viawest).

I acknowledge that you’ve said that you are not blocking any addresses, but the pcap would help pin down what’s happening. Even if it looks totally normal, that’s a help.

acme-challenge.pcapng (1.3 MB)

Thanks. Don’t have any strong conclusions, though.

Looks like all the validation servers can reach this nameserver.

One weird thing is the [RST,ACK] at the end of every TCP conversation, but that just might be some NAT oddity - both peers are behind NAT.

The other thing is that some of the DNS responses are very large, like 6KB. For some reason, when a query with the norecurse flag is sent, your nameserver comes back with a full authority & additional section, which gets kind of huge when the response is also authenticated. I don’t think that’s necessary and could cause Let’s Encrypt’s query deadline to get exceeded?

Finally I noticed that every time I query your nameservers locally, the TCP segments come back out of order, which produces a noticable delay. But I can’t really reproduce it from other networks so meh :man_shrugging:.

1 Like

Maybe Let’s Encrypt kills out of order packets for some reason?

I just replaced my 4096bit Zone Signing Key with a 2048bit-Key to reduce the size of the answer and now it works. So there is an undocumented size limit. And it only exists since a few days. Nice. Please fix that.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.