_acme-challenge cname (as per acme-dns) broken

Hi,
I've been successfully using acme-dns for my letsencrypt dns-01 validation for years. As of today, all renewals are failing with the following error:

[error,type]|urn:ietf:params:acme:error:dns|
[error,detail]|DNS problem: NXDOMAIN looking up TXT for _acme-challenge.doorpi.sembritzki.me - check that a DNS record exists for this domain|

This happens independent of client (I've been using acme.sh and dehydrated in production and both are failing).

dig shows that the txt records do indeed exist.

Example record this is happening for: _acme-challenge.doorpi.sembritzki.me

Something is not quite right with your DNS zone(s):

nslookup -q=ns _acme-challenge.doorpi.sembritzki.me ns1.routing.net
_acme-challenge.doorpi.sembritzki.me    canonical name = 5908a2ba-5891-4cec-a741-1547d0244029.acme-dns.sembritzki.org
acme-dns.sembritzki.org nameserver = sembritzki.org
sembritzki.org  internet address = 5.45.101.249

nslookup -q=ns _acme-challenge.doorpi.sembritzki.me 1.0.0.1
*** one.one.one.one can't find _acme-challenge.doorpi.sembritzki.me: Non-existent domain

Your authoritative DNS servers locate a CNAME, but other Internet DNS servers don't.

Hm, there has been no changes to dns. I'm also experiencing the same issue with domains hosted elsewhere (= different nameserver). An example domain would be _acme-challenge.3cx.stadt-luetjenburg.de.

I was able to reproduce the NXDOMAIN with 1.0.0.1. However, 8.8.8.8 does return the correct CNAME.
Very weird... Do you have any ideas what could be causing this?

Something isn't quite right with this "setup":

nslookup -q=ns _acme-challenge.3cx.stadt-luetjenburg.de. cns1.alfahosting.info
_acme-challenge.3cx.stadt-luetjenburg.de        canonical name = c8c45643-e12f-45a7-80b4-cd0e16456bd2.acme-dns.sembritzki.org

nslookup -q=ns 3cx.stadt-luetjenburg.de. cns1.alfahosting.info
*** cns1.alfahosting.info can't find 3cx.stadt-luetjenburg.de.: Non-existent domain

nslookup -q=ns stadt-luetjenburg.de. cns1.alfahosting.info
stadt-luetjenburg.de    nameserver = cns1.alfahosting.info
stadt-luetjenburg.de    nameserver = cns2.alfahosting.info
stadt-luetjenburg.de    nameserver = cns3.alfahosting.info

Even the authoritative server fails with the "3cx" subdomain.

That is on purpose. Only the _acme-challenge subdomain is supposed to exist on the public dns. It's a split-dns setup and the domain we need the cert for is only used internally.

Also, why you are querying the ns record? It should be the txt record?

nslookup -q=txt _acme-challenge.doorpi.sembritzki.me 1.0.0.1
Server:		1.0.0.1
Address:	1.0.0.1#53

Non-authoritative answer:
_acme-challenge.doorpi.sembritzki.me	canonical name = 5908a2ba-5891-4cec-a741-1547d0244029.acme-dns.sembritzki.org.
5908a2ba-5891-4cec-a741-1547d0244029.acme-dns.sembritzki.org	text = "jyD_V11790NWjNPiNKLMh-39u7vKhjtnu4aVnusoSa4"
5908a2ba-5891-4cec-a741-1547d0244029.acme-dns.sembritzki.org	text = "QSrZhUink3gkIfSsNM8D1koKU8KQYSB1X9D5cHvWhP8"

That is a hit-or-miss.
See:

nslookup -q=txt _acme-challenge.doorpi.sembritzki.me 4.2.2.2
Server:  b.resolvers.Level3.net
Address:  4.2.2.2

*** b.resolvers.Level3.net can't find _acme-challenge.doorpi.sembritzki.me: Non-existent domain

So, yes, there is a DNS issue with your zone.

I think it's this issue with acme-dns: acme-dns returns NXDOMAIN for A records of existing subdomains rather than NOERROR with empty answer · Issue #257 · joohoi/acme-dns · GitHub

How can one DNS server (1.0.0.1) return a reply while another DNS server (4.2.2.2) returns NXDOMAIN?
They both can't be right.

I agree. And I have absolutely no idea why they are doing this. The acme-dns nameserver returns consistent results.

Looks like you may have fixed this already?

https://unboundtest.com/m/TXT/_acme-challenge.doorpi.sembritzki.me/WYG3WRFH

No, for an unknown reason some nameservers (e.g. 1.1.1.1) return the txt correctly and some (e.g. 8.8.8.8) return NXDOMAIN. Unfortunately, the letsencrypt dns-01 validation seems to use the latter.
The nameservers returning NXDOMAIN never even hit my acme-dns nameserver.

$ dig @1.1.1.1 txt 050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org

; <<>> DiG 9.11.32-RedHat-9.11.32-1.fc33 <<>> @1.1.1.1 txt 050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45658
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org. IN TXT

;; ANSWER SECTION:
050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org. 1	IN TXT "s_ZEEC5vpa3lXOg6mH5vOiHupS7zX4_qOlGiA27sM44"
050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org. 1	IN TXT "cPg8zeluYF5EEcYQc0qagECbzzaNkNvWpMU4jtHYb4w"

;; Query time: 290 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Mi Jun 09 08:39:44 CEST 2021
;; MSG SIZE  rcvd: 201
$ dig @acme-dns.sembritzki.org txt 050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org

; <<>> DiG 9.11.32-RedHat-9.11.32-1.fc33 <<>> @acme-dns.sembritzki.org txt 050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22800
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org. IN TXT

;; ANSWER SECTION:
050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org. 1	IN TXT "s_ZEEC5vpa3lXOg6mH5vOiHupS7zX4_qOlGiA27sM44"
050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org. 1	IN TXT "cPg8zeluYF5EEcYQc0qagECbzzaNkNvWpMU4jtHYb4w"

;; Query time: 27 msec
;; SERVER: 5.45.101.249#53(5.45.101.249)
;; WHEN: Mi Jun 09 08:43:15 CEST 2021
;; MSG SIZE  rcvd: 321

$ dig @8.8.8.8 txt 050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org

; <<>> DiG 9.11.32-RedHat-9.11.32-1.fc33 <<>> @8.8.8.8 txt 050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 63132
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;050e927c-5e91-4ef3-8357-26c2fed333fe.acme-dns.sembritzki.org. IN TXT

;; Query time: 31 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mi Jun 09 08:39:49 CEST 2021
;; MSG SIZE  rcvd: 89

I have also used the google dns cache flush, but that didn't help.

Let's Encrypt DNS validation uses Unbound (like the linked test). Caching nameservers are not used for dns-01, only the nameservers you point to are, so your primary nameservers(s) are queried, then they following the CNAME to your acme-dns hosted zone and get the TXT record from there. A side-effect of this is that your acme-dns service is also a nameserver for it's own subdomain zone, so it needs to be behaving.

So from what I can tell the only thing they need to get right is your CNAME, and the acme-dns response. This works for me:
dig @acme-dns.sembritzki.org -t TXT 5908a2ba-5891-4cec-a741-1547d0244029.acme-dns.sembritzki.org

The dig command does work for me too, but for some reason unclear to me, letsencrypt returns the following error:
"DNS problem: NXDOMAIN looking up TXT for _acme-challenge.doorpi.sembritzki.me - check that a DNS record exists for this domain" (I can reproduce this 100%, just did it again).
I can see on the acme-dns server side, that letsencrypt isn't even hitting my acme-dns server.

I notice that DNS Vis gets annoyed that one of the nameservers for sembritzki.me/sembritzki.org isn't resolving UDP queries (maybe TCP only): _acme-challenge.doorpi.sembritzki.me | DNSViz - no idea if that would have any impact.

Has anything at all changed in your acme dns server? Rebuild or config updates?

No, there were no changes at all. The problem occured completely out of the blue.
While trying to fix this, I have updated acme-dns to the latest release (my docker container was one release behind), but that didn't help.

I have now also implemented a fix for acme-dns returns NXDOMAIN for A records of existing subdomains rather than NOERROR with empty answer · Issue #257 · joohoi/acme-dns · GitHub to make sure this issue isn't causing anything, but that didn't help.

I'm also quite confused about the dns viz result, that ns2.routing.net isn't resolving UDP queries, because it does for me: dig -6 @ns2.routing.net sembritzki.org +notcp

Edit: dnsviz is not reporting the ipv6 error anymore. However, my issue still persists.

I have an idea on what could be causing this:

The google public dns tool response contains a note "response from ip xx.xx": Query: c8c45643-e12f-45a7-80b4-cd0e16456bd2.acme-dns.sembritzki.org - Google Public DNS

I can see that this is the IP of one of my hosters nameservers.
This brought me to the following idea:

$ dig @ns1.routing.net txt c8c45643-e12f-45a7-80b4-cd0e16456bd2.acme-dns.sembritzki.org.

; <<>> DiG 9.11.32-RedHat-9.11.32-1.fc33 <<>> @ns1.routing.net txt c8c45643-e12f-45a7-80b4-cd0e16456bd2.acme-dns.sembritzki.org.
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 31383
;; flags: qr rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;c8c45643-e12f-45a7-80b4-cd0e16456bd2.acme-dns.sembritzki.org. IN TXT

;; Query time: 27 msec
;; SERVER: 2a03:2900:3:1::2#53(2a03:2900:3:1::2)
;; WHEN: Mi Jun 09 15:54:39 CEST 2021
;; MSG SIZE  rcvd: 89

As you can see, the nameserver returns NXDOMAIN.

For the acme-dns testserver auth.acme-dns.io, a similar query to the acme-dns.io nameserver is answered with NOERROR and the responsible nameserver:

$ dig @pablo.ns.cloudflare.com. e56a5a16-2f0e-47c8-862e-973d9c318cf6.auth.acme-dns.io   

; <<>> DiG 9.11.32-RedHat-9.11.32-1.fc33 <<>> @pablo.ns.cloudflare.com. e56a5a16-2f0e-47c8-862e-973d9c318cf6.auth.acme-dns.io
; (6 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54882
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;e56a5a16-2f0e-47c8-862e-973d9c318cf6.auth.acme-dns.io. IN A

;; AUTHORITY SECTION:
auth.acme-dns.io.	300	IN	NS	ns.auth.acme-dns.io.

;; ADDITIONAL SECTION:
ns.auth.acme-dns.io.	300	IN	A	46.4.128.227

;; Query time: 17 msec
;; SERVER: 2606:4700:58::adf5:3bdc#53(2606:4700:58::adf5:3bdc)
;; WHEN: Mi Jun 09 15:56:00 CEST 2021
;; MSG SIZE  rcvd: 115

Are there any RFCs that mandate the latter behavior? If there were, I'd be able to take this issue up with my provider.