DNS-01 problem with dehydrated

You don’t know what nameserver Let’s Encrypt’s resolver is taking its decision from. For all you know, it is checking all 3 and taking a quorum decision.

Anyway,

This isn’t an option. SOA MNAME is not used as any kind of hint by recursive resolvers - only for dynamic DNS updates.

You need to wait for your slaves to update before responding to the challenge, or pull your slaves.

1 Like

This isn’t an option.

LE should always prefer the master (SOA MNAME), especially when its records are signed (DANE).

On resolving, a simple “dig @$master +dnssec +short -tTXT _acme-challenge.$fqdn” would do, with no need to wait for the dns global databases to pick up LE’s temporary RRs.

Do you use NSD? If so these may be worth a try


12h ; refresh
2h ; retry
2w ; expire
1h ; min TTL

These are my RFC sane setting:

1200 ; SOA Refresh: slaves must refresh (learn zone changes) after 1200–43200 seconds
7200 ; SOA Retry: slaves must retry contacting master up to 120-7200 seconds
604800 ; SOA Expire: slaves must revalidate after 604800–1209600 seconds
3600 ; SOA Minimum: slaves must flush negative responses after 3600–86400 seconds

I prefer my 20min to your 12h refresh.

I still find it unreasonable for LE to force me to wait SOA Refresh + some, especially because you are doing it twice, for the fqdn and for the wildcard.

1 Like

If not SOA Expire…

You have Retry > Refresh, is it on purpose?

1 Like

I am within the RFC timing boundaries.

According to who? (I genuinely don’t know)

According to anybody who knows what DANE is and knows how to query it.

https://tools.ietf.org/html/rfc6698

https://www.huque.com/bin/danecheck
https://www.huque.com/bin/gen_tlsa

1 Like

I am still not connecting the dots on wtf DANE has to do with how DNS recursors perform their queries. As far as I can tell, recursors don’t care, and have never cared about SOA MNAME.

To cite https://tools.ietf.org/html/rfc8499 ,

The idea of a primary master is only used in [RFC1996] and
[RFC2136]. A modern interpretation of the term “primary master”
is a server that is both authoritative for a zone and that gets
its updates to the zone from configuration (such as a master file)
or from UPDATE transactions.

RFC1996 and RFC2136 being DNS NOTIFY and DNS UPDATE, neither relevant for recursors.

You can play with unboundtest.com if you like, it’s the same recursor + similar configuration to what Let’s Encrypt use for their VA - lots of verbose logging.

1 Like

Why is LE using a dns recursor when a simple “dig @$master +dnssec +short -tTXT _acme-challenge.$fqdn” would do, with no need to wait for the dns global databases to pick up LE’s temporary RRs?

With DANE, both ports 25 and 443 are signed in the DNS using a hash of their respective TLS certificates, who happen to be those you are updating from LE. LE could be smarter in this case, with no need for temporary acme RRs.

You presume that the master is accessible from the Internet - that is NOT a requirement.
It only needs to be accessible to the slaves.

Your plan puts “all (DNS) eggs in one (MASTER) basket”.
And would require the DNS resolver to do a series of “if then else” logic tests/steps.
[You are essentially rewriting DNS]

2 Likes

https://www.cloudflare.com/learning/dns/what-is-dns/ .

How does Let’s Encrypt even discover what your primary server is (in the hypothetical world where that means something)? It does it by descending from the DNS root zone (.), until it eventually finds your SOA record.

That’s why you use a recursor. It does the chain of lookups for you.

You have assigned meaning to SOA MNAME that simply doesn’t exist. Your three nameservers are completely equivalent to each other (in terms of priority and authoritativeness), to every recursor on the internet. You need to deal with the slave lag by sleeping.

This is what people who use e.g. Linode DNS hosting do - they literally put 20 minute sleeps into their renewal scripts, because Linode’s slave lag is so bad.

2 Likes

Or try speeding that up with change notifications.

From a top-down recursive view, all authoritative nameserver are provided by the level above via DNS Glue records [which are all created equally].
There is no “Super Glue” record that points to the Master.

1 Like

Who says that slaves must exist? What does LE do when a domain’s dns has no slaves? It queries the global dnss, who pick up the updates directly from the domain’s dns.

Now, this is no ordinary query, as LE is demanding ad hoc temporary RRs, whose propagation to the global dnss may well take hours. LE can and should take a shortcut by first querying for the domain’s NS RRs, select the one with lowest priority, then query it directly.

unbound-host -rvD -tNS $fqdn

pick the dns with lower priority

No one.
That is a “local” concept that defines how DNS servers, at the same level/zone, interact with each other. [one is defined as the leader and the rest follow as slaves]

In Internet DNS, there is no leader, only zones. Each zone has a predefined authoritative set of DNS servers. Each zone is linked to the zone above (by definition with Glue Records).
That your local servers see one as a leader or add/remove servers means nothing to the zone above.
For instance: To add/remove an authoritative server you have to alter the Glue Records (in the zone above).
Unlike SMTP (MX records) there is no concept of “cost”; there is no top-down concept of DNS preference (as you implied by “SOA”).

Again, your best bet is to force DNS synchronization via:

  • DNS NOTIFY
  • DNS Push Notification
  • DNS Zone Change Notification

[call it whatever you like - on any change, have the “MASTER” immediately tell the “SLAVES” the zone has changed]
In Microsoft DNS it looks like this:
image

1 Like

(20 minute sleeps sounds awfully fragile, why not renew in two phases with two separate cron lines, the first sets it up, the second checks the first part run – maybe even digs the txt record – and tells boulder the challenge is ready?)

DNS is a geographically distributed database whose servers are divided into recursive (readers) and authoritative (writers). When you add LE’s RR TXT to your dns zone on your authoritative server, you are “writing” into the global dns. The actual writing on all servers is indirect and time consuming, as the servers read and cache at their own time. There is a hierarchy. First comes your authoritative dns, the only one authorized to write your zone. Then come your caching slaves, the only one authorised to propagate further. Finally, the rest of the global servers, who can only cache your original zone. Again, this takes time. When LE uses a recursive dns to read your fresh acme RR, LE will not find it, and thus fails the challenge verification. This is utterly frustrating. To speed up the acme verification, LE can avoid using slow recursive dnss, and query the authoritative (master) server directly. For example, to find the server you can do this:

unbound-host -rvD -tNS $fqdn

If the answer is secure (dnssec), then you select the dns with lowest priority, say ns0.$fqdn.

You query ns0.$fqdn for the acme RR TXT, which is up to date, because you queried the authoritative dns, with no need to waste time waiting for the global dns cache.

Not all slaves accept soliciting for updates.