DNS-01 problem with dehydrated

RuGa · March 15, 2020, 8:07am

Who says that slaves must exist? What does LE do when a domain’s dns has no slaves? It queries the global dnss, who pick up the updates directly from the domain’s dns.

Now, this is no ordinary query, as LE is demanding ad hoc temporary RRs, whose propagation to the global dnss may well take hours. LE can and should take a shortcut by first querying for the domain’s NS RRs, select the one with lowest priority, then query it directly.

RuGa · March 15, 2020, 8:13am

unbound-host -rvD -tNS $fqdn

pick the dns with lower priority

rg305 · March 15, 2020, 8:39am

No one.
That is a "local" concept that defines how DNS servers, at the same level/zone, interact with each other. [one is defined as the leader and the rest follow as slaves]

In Internet DNS, there is no leader, only zones. Each zone has a predefined authoritative set of DNS servers. Each zone is linked to the zone above (by definition with Glue Records).
That your local servers see one as a leader or add/remove servers means nothing to the zone above.
For instance: To add/remove an authoritative server you have to alter the Glue Records (in the zone above).
Unlike SMTP (MX records) there is no concept of "cost"; there is no top-down concept of DNS preference (as you implied by "SOA").

Again, your best bet is to force DNS synchronization via:

DNS NOTIFY
DNS Push Notification
DNS Zone Change Notification

[call it whatever you like - on any change, have the "MASTER" immediately tell the "SLAVES" the zone has changed]
In Microsoft DNS it looks like this:

9peppe · March 15, 2020, 10:44am

(20 minute sleeps sounds awfully fragile, why not renew in two phases with two separate cron lines, the first sets it up, the second checks the first part run -- maybe even digs the txt record -- and tells boulder the challenge is ready?)

RuGa · March 15, 2020, 3:08pm

DNS is a geographically distributed database whose servers are divided into recursive (readers) and authoritative (writers). When you add LE's RR TXT to your dns zone on your authoritative server, you are "writing" into the global dns. The actual writing on all servers is indirect and time consuming, as the servers read and cache at their own time. There is a hierarchy. First comes your authoritative dns, the only one authorized to write your zone. Then come your caching slaves, the only one authorised to propagate further. Finally, the rest of the global servers, who can only cache your original zone. Again, this takes time. When LE uses a recursive dns to read your fresh acme RR, LE will not find it, and thus fails the challenge verification. This is utterly frustrating. To speed up the acme verification, LE can avoid using slow recursive dnss, and query the authoritative (master) server directly. For example, to find the server you can do this:

unbound-host -rvD -tNS $fqdn

If the answer is secure (dnssec), then you select the dns with lowest priority, say ns0.$fqdn.

You query ns0.$fqdn for the acme RR TXT, which is up to date, because you queried the authoritative dns, with no need to waste time waiting for the global dns cache.

RuGa · March 15, 2020, 3:10pm

Not all slaves accept soliciting for updates.

9peppe · March 15, 2020, 4:33pm

it doesn't work like that.

dns is not dht.

there are root servers (13) that you can query for the root zone. from there, you get servers for each tld com., net.... and so on for each domain and subdomain. each of those is authoritative.

recursive nameservers do not store zone data, they only cache queries, and they are the "clients" in this system.

LE doesn't. All your nameservers in dig yourdomain ns are authoritative.

RuGa · March 15, 2020, 8:29pm

There is only one authoritative dns server for your zone: the master. The slaves help the master, they are no substitute for it: they are caching only, not allowed to change your definition of your zone and its RRs. The global dns servers are caching only: they are not authorised to define your zone and its RRs. This is how my dnss are defined, implementing both dnssec and dane. If you allow third party servers to change your zone, good luck with it.

The bottom line is that LE should not query the global dnss, because they are too slow to pick up the acme challenge from the master. LE should rather query the domain’s NSs, and in case the answers are not identical, preference should be given to the NS with lowest priority, the master.

9peppe · March 15, 2020, 8:43pm

In DNS terms, authoritative and primary/master mean different things. Primary/master and secondary/slave are all authoritative, it's just saying how a nameserver operator choses to update their authoritative nameservers.

Slave and recursive are two extremely different concepts.

Your nameservers are not defined in your SOA record, they are defined in the NS record of your tld's zone:

# dig @$(dig +short com. ns | head -n1) google.com ns

; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @l.gtld-servers.net. google.com ns
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51033
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.                    IN      NS

;; AUTHORITY SECTION:
google.com.             172800  IN      NS      ns2.google.com.
google.com.             172800  IN      NS      ns1.google.com.
google.com.             172800  IN      NS      ns3.google.com.
google.com.             172800  IN      NS      ns4.google.com.

;; ADDITIONAL SECTION:
ns2.google.com.         172800  IN      AAAA    2001:4860:4802:34::a
ns2.google.com.         172800  IN      A       216.239.34.10
ns1.google.com.         172800  IN      AAAA    2001:4860:4802:32::a
ns1.google.com.         172800  IN      A       216.239.32.10
ns3.google.com.         172800  IN      AAAA    2001:4860:4802:36::a
ns3.google.com.         172800  IN      A       216.239.36.10
ns4.google.com.         172800  IN      AAAA    2001:4860:4802:38::a
ns4.google.com.         172800  IN      A       216.239.38.10

;; Query time: 10 msec
;; SERVER: 2001:500:d937::30#53(2001:500:d937::30)
;; WHEN: Sun Mar 15 21:42:00 CET 2020
;; MSG SIZE  rcvd: 287

RuGa · March 15, 2020, 8:48pm

I did not say that the slaves are recursive.

I did not say that the NS RR information is written in the SOA RR.

9peppe · March 15, 2020, 8:49pm

In your case, @RuGa, none of this matters, because your ns0.yourdomain.org nameserver is not responding to any query at all. This is why verification fails.

RuGa · March 15, 2020, 8:51pm

Not true. The log above shows the master is returning the correct answers.

The problem is that the slaves take 20min+ to pick up the update from the master. Since LE uses recursive resolvers, as was said above, this means that LE can only read those RR TXT when one of its resolvers has them, which fact takes even more time.

9peppe · March 15, 2020, 8:53pm

It's not reachable from the internet, then -- that's a private IP. (or not listening on zz.yy.xx.164:53)

Remove its NS record and only tell its address to your slaves (an "hidden master" infrastructure)

RuGa · March 15, 2020, 9:03pm

Remove the reference to the public IP.

The public IP does respond to direct queries. It refuses axfr, because you are not an authorised slave, but it responds to public queries. There is a firewall that blocks abuses, so if you hit it, its your fault.

9peppe · March 15, 2020, 9:08pm

It doesn't respond to A.

Yeah... that's why, relax your firewall. nameservers might work like that for users, but multiperspective validation is not happy at all.

RuGa · March 15, 2020, 9:59pm

Cloudflare had no problem picking up the updates from the master’s public IP. If you cannot query the IP, then you did something to upset its firewall.

mnordhoff · March 16, 2020, 3:42am

Can your primary nameserver send notifications to your secondary nameservers?

RuGa · March 16, 2020, 2:37pm

No, unfortunately. The provider does not allow for it.

RuGa · March 16, 2020, 2:39pm

Can you remove the reference to the acme url in my two posts above? The site locked the posts and I cannot do it myself.

Topic		Replies	Views
Please query the authoritative DNS(SEC) with dns-01 Feature Requests	44	3818	March 29, 2021
Dns-01 use cached reply from own letsencrypt ns Help	16	2097	July 2, 2020
DNS records not read correctly Help	32	1488	July 16, 2021
Shouldn't verification via DNS record be a priority? Issuance Tech	62	40932	August 25, 2016
DNS challenge is in staging Feature Requests	49	28068	February 10, 2016

DNS-01 problem with dehydrated

Related topics