DNS problem: SERVFAIL looking up TXT

I’ve attempted to use dns-01 authorization and provisioned the record properly (I believe).
I received an error message when I return the challenge that I cannot figure out a solution for:

urn:acme:error:connection
DNS problem: SERVFAIL looking up TXT for _acme-challenge.kkv.pl

Testing against acme-staging.api.letsencrypt.org, I can’t figure out what may be the problem boulder has with resolving the record. Obviously I tried with dig against many DNS servers (authoritative, 8.8.8.8, 8.8.4.4, my ISP’s DNS) - they all reply properly.

; <<>> DiG 9.9.5-12.1-Debian <<>> TXT _acme-challenge.kkv.pl @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60423
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;_acme-challenge.kkv.pl.                IN      TXT

;; ANSWER SECTION:
_acme-challenge.kkv.pl. 0       IN      TXT     "enAHY01aoA6gaqiDAdrSeq4o_r7CLBIEzJmBK8O_ugM"

;; Query time: 64 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Jan 31 19:19:30 CET 2016
;; MSG SIZE  rcvd: 107

I’ve experimented with multiple TTLs to make sure LE isn’t getting a cached record: started with 300, then 1, 5 and 10. They all have the same effect.

The record is published now and you should all be able to query it. Thanks in advance for help in troubleshooting that.

Your DNS servers appear to be behaving very oddly for some requests.

$ dig @goweb1.spigu.net. CAA kkv.pl.
$ dig @goweb2.spigu.net. CAA kkv.pl.

Both result in SERVFAIL.

@hlandau Thanks for pointing that out. I’ve fixed that mistake. Server is replying NXDOMAIN as we don’t have CAA records.

However, now we get an error message: “DNS problem: NXDOMAIN looking up TXT for _acme-challenge.kkv.pl”. When I dig it, I get NOERROR.

I don’t get a NOERROR here, but a NXDOMAIN:

osiris@desktop ~ $ dig @goweb1.spigu.net _acme-challenge.kkv.pl

; <<>> DiG 9.9.5 <<>> @goweb1.spigu.net _acme-challenge.kkv.pl
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 21954
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;_acme-challenge.kkv.pl.		IN	A

;; AUTHORITY SECTION:
kkv.pl.			300	IN	SOA	goweb1.spigu.net. bok.chmurka.pl. 1454472000 21600 3600 1814400 300

;; Query time: 13 msec
;; SERVER: 2001:41d0:a:3144::#53(2001:41d0:a:3144::)
;; WHEN: Wed Feb 03 05:50:36 CET 2016
;; MSG SIZE  rcvd: 112

Perhaps your fix has to propogate to the appropiate servers? (goweb1.spigu.net and 2…

@Osiris You are making a query for record A, and I understand that LE is querying for record TXT. The following should work:

dig @goweb1.spigu.net TXT _acme-challenge.kkv.pl

1 Like

Well, that’s stupid of me :dizzy_face:

I think your first and sentences contradict each other, unless I misunderstand? You should be returning NOERROR, not NXDOMAIN. Are you using a custom DNS server?

It's possible your version of dig doesn't understand CAA. You can also try dig -t TYPE257 ...

To clarify:

  1. When querying for CAA record (dig CAA kkv.pl @goweb1.spigu.net):
    Previously:
  • DNS server returned SERVFAIL (erronously)
  • ACME API returned "DNS problem: SERVFAIL looking up TXT for _acme-challenge.kkv.pl" (the error message was wrong - it was looking up CAA records, not TXT)

Now:

  • DNS server returns NXDOMAIN (since we don't have CAA records in the zone)
  • ACME API returns "DNS problem: NXDOMAIN looking up TXT for _acme-challenge.kkv.pl" <-- no idea why
  1. When querying for TXT record, we always returned the response correctly, as you can verify with dig at any time now.

No I am not sure - does LE require CAA records to be set up?
I believe the fact that it returned error mentioning SERVFAIL for TXT while in fact it got SERVFAIL for CAA might be a bug in boulder.

LE doesn’t require CAA records, but it does require that your DNS server either return DNS records or a statement that no such records exist (NOERROR or NXDOMAIN). SERVFAIL is neither.

Out of curiosity, what on earth kind of DNS setup are you using that so much SERVFAILs by default?

As explained above, CAA record returns NXDOMAIN and TXT record returns NOERROR (record exists) and I still get the error from ACME API.

@hlandau We use a custom DNS server written in Go which in one case failed recursion and hence returned SERVFAIL for CAA records that didn’t exist instead of NXDOMAIN.

Firstly, it makes no sense for a CAA query to return NXDOMAIN. You only return NXDOMAIN if a) there is no record of any type at that domain, and b) nor is there any record of any type at any subdomain of that domain.

So the only circumstance in which a CAA query should return NXDOMAIN is if you literally have no records at or directly or indirectly under that domain, in which case one would ask why you want a certificate in the first place.

Moreover, for kkv.pl in particular, there is no circumstance in which an authoritative nameserver for that domain could ever sanely return NXDOMAIN for a query for kkv.pl, because any such zone must at least have SOA and NS records. So returning NXDOMAIN is definitely wrong here.

I think you need to do a lot more conformance testing on your DNS server.

Secondly, why do you have an authoritative nameserver doing recursive lookups?

Thanks for your elaborate answer, @hlandau. It helped me track down some issues with the DNS servers.

I’m sorry to trouble you with some of the quirks of our non-standard system. To answer your last question, the server is not doing “recursion” in the typical DNS sense. We have a few legacy applications that think they run the authoritative nameservers and we need to have a custom actually authoritative nameserver in front of them merging the results with proper fallbacks. It’s a temporary solution and it’s so ugly I don’t want to explain it further - everything is possible in legacy maintenance.

What’s on-topic for Let’s Encrypt which would probably save us from all the troubleshooting problems is better error message. It looks like the ACME server (boulder) consistently reports the error as if it happened while checking “_TXT for acme-challenge.kkv.pl”, while in fact it really is checking “CAA for kkv.pl”. I would consider this as a bug as the error message is misleading.

I have updated the bug report with the new information but somebody closed it already.