DNS-01 problem with dehydrated

Who says that slaves must exist? What does LE do when a domain’s dns has no slaves? It queries the global dnss, who pick up the updates directly from the domain’s dns.

Now, this is no ordinary query, as LE is demanding ad hoc temporary RRs, whose propagation to the global dnss may well take hours. LE can and should take a shortcut by first querying for the domain’s NS RRs, select the one with lowest priority, then query it directly.

unbound-host -rvD -tNS $fqdn

pick the dns with lower priority

No one.
That is a "local" concept that defines how DNS servers, at the same level/zone, interact with each other. [one is defined as the leader and the rest follow as slaves]

In Internet DNS, there is no leader, only zones. Each zone has a predefined authoritative set of DNS servers. Each zone is linked to the zone above (by definition with Glue Records).
That your local servers see one as a leader or add/remove servers means nothing to the zone above.
For instance: To add/remove an authoritative server you have to alter the Glue Records (in the zone above).
Unlike SMTP (MX records) there is no concept of "cost"; there is no top-down concept of DNS preference (as you implied by "SOA").

Again, your best bet is to force DNS synchronization via:

  • DNS NOTIFY
  • DNS Push Notification
  • DNS Zone Change Notification

[call it whatever you like - on any change, have the "MASTER" immediately tell the "SLAVES" the zone has changed]
In Microsoft DNS it looks like this:
image

1 Like

(20 minute sleeps sounds awfully fragile, why not renew in two phases with two separate cron lines, the first sets it up, the second checks the first part run -- maybe even digs the txt record -- and tells boulder the challenge is ready?)

DNS is a geographically distributed database whose servers are divided into recursive (readers) and authoritative (writers). When you add LE's RR TXT to your dns zone on your authoritative server, you are "writing" into the global dns. The actual writing on all servers is indirect and time consuming, as the servers read and cache at their own time. There is a hierarchy. First comes your authoritative dns, the only one authorized to write your zone. Then come your caching slaves, the only one authorised to propagate further. Finally, the rest of the global servers, who can only cache your original zone. Again, this takes time. When LE uses a recursive dns to read your fresh acme RR, LE will not find it, and thus fails the challenge verification. This is utterly frustrating. To speed up the acme verification, LE can avoid using slow recursive dnss, and query the authoritative (master) server directly. For example, to find the server you can do this:

unbound-host -rvD -tNS $fqdn

If the answer is secure (dnssec), then you select the dns with lowest priority, say ns0.$fqdn.

You query ns0.$fqdn for the acme RR TXT, which is up to date, because you queried the authoritative dns, with no need to waste time waiting for the global dns cache.

Not all slaves accept soliciting for updates.

it doesn't work like that.

dns is not dht.

there are root servers (13) that you can query for the root zone. from there, you get servers for each tld com., net.... and so on for each domain and subdomain. each of those is authoritative.

recursive nameservers do not store zone data, they only cache queries, and they are the "clients" in this system.

LE doesn't. All your nameservers in dig yourdomain ns are authoritative.

2 Likes

There is only one authoritative dns server for your zone: the master. The slaves help the master, they are no substitute for it: they are caching only, not allowed to change your definition of your zone and its RRs. The global dns servers are caching only: they are not authorised to define your zone and its RRs. This is how my dnss are defined, implementing both dnssec and dane. If you allow third party servers to change your zone, good luck with it.

The bottom line is that LE should not query the global dnss, because they are too slow to pick up the acme challenge from the master. LE should rather query the domain’s NSs, and in case the answers are not identical, preference should be given to the NS with lowest priority, the master.

In DNS terms, authoritative and primary/master mean different things. Primary/master and secondary/slave are all authoritative, it's just saying how a nameserver operator choses to update their authoritative nameservers.

Slave and recursive are two extremely different concepts.

Your nameservers are not defined in your SOA record, they are defined in the NS record of your tld's zone:

# dig @$(dig +short com. ns | head -n1) google.com ns

; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @l.gtld-servers.net. google.com ns
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51033
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.                    IN      NS

;; AUTHORITY SECTION:
google.com.             172800  IN      NS      ns2.google.com.
google.com.             172800  IN      NS      ns1.google.com.
google.com.             172800  IN      NS      ns3.google.com.
google.com.             172800  IN      NS      ns4.google.com.

;; ADDITIONAL SECTION:
ns2.google.com.         172800  IN      AAAA    2001:4860:4802:34::a
ns2.google.com.         172800  IN      A       216.239.34.10
ns1.google.com.         172800  IN      AAAA    2001:4860:4802:32::a
ns1.google.com.         172800  IN      A       216.239.32.10
ns3.google.com.         172800  IN      AAAA    2001:4860:4802:36::a
ns3.google.com.         172800  IN      A       216.239.36.10
ns4.google.com.         172800  IN      AAAA    2001:4860:4802:38::a
ns4.google.com.         172800  IN      A       216.239.38.10

;; Query time: 10 msec
;; SERVER: 2001:500:d937::30#53(2001:500:d937::30)
;; WHEN: Sun Mar 15 21:42:00 CET 2020
;; MSG SIZE  rcvd: 287
2 Likes

I did not say that the slaves are recursive.

I did not say that the NS RR information is written in the SOA RR.

In your case, @RuGa, none of this matters, because your ns0.yourdomain.org nameserver is not responding to any query at all. This is why verification fails.

1 Like

Not true. The log above shows the master is returning the correct answers.

The problem is that the slaves take 20min+ to pick up the update from the master. Since LE uses recursive resolvers, as was said above, this means that LE can only read those RR TXT when one of its resolvers has them, which fact takes even more time.

It's not reachable from the internet, then -- that's a private IP. (or not listening on zz.yy.xx.164:53)

Remove its NS record and only tell its address to your slaves (an "hidden master" infrastructure)

1 Like

Remove the reference to the public IP.

The public IP does respond to direct queries. It refuses axfr, because you are not an authorised slave, but it responds to public queries. There is a firewall that blocks abuses, so if you hit it, its your fault.

It doesn't respond to A.

Yeah... that's why, relax your firewall. nameservers might work like that for users, but multiperspective validation is not happy at all.

1 Like

Cloudflare had no problem picking up the updates from the master’s public IP. If you cannot query the IP, then you did something to upset its firewall.

Can your primary nameserver send notifications to your secondary nameservers?

No, unfortunately. The provider does not allow for it.

Can you remove the reference to the acme url in my two posts above? The site locked the posts and I cannot do it myself.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.