SERVFAIL looking up TXT (IDNA or DNSSEC issues?)


#1

I’m having problems with Let’s Encrypt (through the dehydrated client) and the DNS challenge (Knot DNS). It’s an IDNA domain, not sure if it’s related (getting certs seems to work through other non-IDNA domains).

The domain is gfrör.li (or in IDNA: xn–gfrr-7qa.li). Here’s the process:

First, let’s ensure there’s no existing TXT record:

$ dig +short TXT _acme-challenge.xn--gfrr-7qa.li @coredump01.nine.ch
$

Now I’ll start the certificate process. Dehydrated will fetch a challenge from the ACME server, then the hook script will update the local DNS server through nsupdate.

$ /opt/letsencrypt/dehydrated -f config -c
# INFO: Using main config file config
Processing xn--gfrr-7qa.li
 + Checking domain name(s) of existing cert... changed!
 + Domain name(s) are not matching!
 + Names in old certificate: www.xn--gfrr-7qa.li xn--gfrr-7qa.li
 + Configured names: xn--gfrr-7qa.li
 + Forcing renew.
 + Checking expire date of existing cert...
 + Valid till Sep  8 00:21:27 2018 GMT (Less than 30 days). Renewing!
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting challenge for xn--gfrr-7qa.li...
 + [hook] Requesting nsupdate add for xn--gfrr-7qa.li (zone xn--gfrr-7qa.li.)...

(Yes, I dropped the subdomain for debugging purposes.)

Here’s the update log from the DNS server:

Dec 30 01:21:44 coredump01 knotd[1219]: 2018-12-30T01:21:44 info: [xn--gfrr-7qa.li.] DDNS, processing 1 updates
Dec 30 01:21:44 coredump01 knotd[1219]: 2018-12-30T01:21:44 info: [xn--gfrr-7qa.li.] DDNS, update finished, serial 2018122923 -> 2018122924, 0.01 seconds
Dec 30 01:21:44 coredump01 knotd[1219]: 2018-12-30T01:21:44 info: [xn--gfrr-7qa.li.] zone file updated, serial 2018122923 -> 2018122924

Then the verification fails:

 + Responding to challenge for xn--gfrr-7qa.li...
ERROR: Challenge is invalid! (returned: invalid) (result: {
  "type": "dns-01",
  "status": "invalid",
  "error": {
    "type": "urn:acme:error:dns",
    "detail": "DNS problem: SERVFAIL looking up TXT for _acme-challenge.xn--gfrr-7qa.li",
    "status": 400
  },
  "uri": "https://acme-v01.api.letsencrypt.org/acme/challenge/XXX",
  "token": "TIrDR5Abio3mwrjzBM5Zg3fYD3QPrafFNQ95iSTg-4o"
})

As you can see, the validation failed with SERVFAIL looking up TXT for _acme-challenge.xn--gfrr-7qa.li. I’ve commented out the code that would delete the entry again after verification, so the DNS record is still there if you want to verify it. Here’s the dig call from my local machine:

$ dig TXT _acme-challenge.xn--gfrr-7qa.li @coredump01.nine.ch

; <<>> DiG 9.13.4 <<>> TXT _acme-challenge.xn--gfrr-7qa.li @coredump01.nine.ch
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63977
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_acme-challenge.gfrör.li.      IN      TXT

;; ANSWER SECTION:
_acme-challenge.gfrör.li. 60    IN      TXT     "D64dQUjUrgV_3dGzZXruvYFvM_JoVA_aevy4e8HDU3s"

;; Query time: 78 msec
;; SERVER: 94.230.210.84#53(94.230.210.84)
;; WHEN: Sun Dec 30 01:25:22 CET 2018
;; MSG SIZE  rcvd: 116

I have debugged this for hours now and simply cannot figure out why ACME servers cannot validate this entry. Note: This process used to work beginning of 2018, but stopped working sometime in Q3 or Q2. This could be a DNS server update or something like that, maybe there’s an incompatibility?

I’d be happy about any help.

Note: I tried to set up DNSSEC a while ago, but failed so far, so I removed all related config in the DNS server again. It seems that some things survived though: https://dnssec-debugger.verisignlabs.com/xn--gfrr-7qa.li Could this be the problem? If yes, it would be really really really helpful if the error message would say so.


#2

Yeah, it’s DNSSEC - https://letsdebug.net/xn--gfrr-7qa.li/13865

You need to disable DNSSEC at your domain registrar, because your authoritative nameservers have long since forgotten that configuration.

Once you’ve done that, you can set DNSSEC up again from scratch, if you wish.

True, but the CA itself currently does not know the reason for the SERVFAIL, it just sees the opaque error message from its resolver.


#3

Couldn’t the resolver be configured to return details about the failure? The DNSSEC validation must happen on Let’s Encrypt servers, right?


#4

Well, you have the DNS protocol to thank for that. The DNS message format does not provide a field to include detailed information about what errors were encountered. The only information that is available is at the granularity of the RCode field (which carries half a byte of data, e.g. SERVFAIL).

So if you imagine:

[CA/Validation Authority] <------ DNS protocol ------> [Resolver]                                             

there’s no way for the Resolver to tell the VA that the failure reason was DNSSEC.

To generalize this beyond Let’s Encrypt - if you try to query your domain from a properly configured resolver (like 1.1.1.1 - which runs Knot, like you - or 8.8.8.8) right now, it will result in SERVFAIL without any further detail.

For that reason we have community tooling like unboundtest.com or letsdebug.net. It’d be “nice to have” detailed DNS root cause analysis right in Let’s Encrypt, it’s kinda hard to do and and at the same time it’s kinda surprising that you didn’t notice that your domain stopped resolving much earlier.


Third-party-Tools to check your configuration - Discussion
#5

There’s a draft DNS extension to include more granular error information.

https://tools.ietf.org/html/draft-ietf-dnsop-extended-error-03

If and when it’s standardized and implemented by the DNS software Let’s Encrypt uses, it might be practical to tie it into the CA software.

And of course Let’s Encrypt could totally change their DNS architecture – unboundtest.com exists, after all – but that would probably be a lot of work and increase fragility only to get good error messages.


#6

Ah, I didn’t realize that Let’s Encrypt uses the DNS protocol to query the resolver, but it makes sense of course to use that over a custom protocol :slightly_smiling_face: (Although a custom protocol or a DNS resolver library would allow for detailed error analysis, like the letsdebug site does it.)

I wasn’t aware of those tools, they seem incredibly useful. I wonder if they could be linked from letsencrypt.org? For example, the “Get Help” page could be a subpage with links to this forum, to debugging tools, etc.

Yes, it’s quite sad that I never noticed. It shows that deployment of DNSSEC is almost 0 (neither the resolvers I use nor the browsers I use showed any warning).

True, but the error response (which is structured JSON) could also include an “official debug URL” as an alternative.


#7

For what it’s worth, APNIC has running DNSSEC validation statistics. It varies from 0% to 90% depending on the country. (I don’t know how representative they are.)

https://stats.labs.apnic.net/dnssec

Yeah. I think there’s been some general discussion about error messages and organizing debugging resources recently but … I totally can’t remember it right now …


#8

Interesting stats, thanks for the link. Switzerland is at 10%… (And our government wants to introduce online-voting, yay! :sparkles:)

Highest percentage in central Europe is the Czech Republic at 62%. Probably because of Knot (which is developed by CZNIC).


#9

It’s on the #help pinned post Third-party-Tools to check your configuration :slightly_smiling_face:

Yes: Hint to letsdebug.net in error message


#10

In the end, disabling DNSSEC did help (after waiting a day for the caches), so thanks for your help :slightly_smiling_face:

Next step would be to re-enable it, but this time set up properly.