Please query the authoritative DNS(SEC) with dns-01

patch-work · February 25, 2021, 9:41pm

Hello,

The following problem keeps re-occurring, so we need a solution from the server-side (you at letsencrypt). Please do not broom the problem under the carpet as we depend on it.

We use dns-01 with DNSSEC. When our servers talk, our hook adds your token to our authoritative DNS, then we re-sign the zone to comply with DNSSEC, and await for the slaves to self-update. There is no way around it. This is how it works. Our slaves, supplied by a third party, update 3 times every hour. Sometimes they fail, for reasons that are sufficient to our third party supplier (maintenance, you name it). When the slaves are up to date, then the world begins updating their DNSs.

You understand that the process of renewing an SSL certificate with you cannot possibly rely on your DNS cache, either yours or that of your suppliers, because it takes a long time when it succeeds (~30 min), and even more when it fails. If our slaves fail, like today when they stopped updating for an hour exactly when we were renewing our certificate, it is jolly hard to figure out the problem.

Therefore, we propose you update the procedure as follows.

When the client (us) uses DNSSEC, then trust the client's authoritative DNS. Do not wait for DNS cache updates.

Thank you

JuergenAuer · February 25, 2021, 9:51pm

Hi @patch-work

your interpretation is wrong. Letsencrypt doesn't use dns caches.

Letsencrypt queries always the authoritative name servers via Unbound.

But this

is a fatal setup. Slave updates should happen in minutes.

If you can't change that, you should create an own client (or use a client with such a feature) with a longer wait before confirming the challenge.

You can use

https://unboundtest.com/

to check that. Unbound checks always more then one of the authoritative name servers.

So if the name servers have different results -> that's fatal, that's expected.

patch-work · February 25, 2021, 9:54pm

Hi Juergen, please note that slaves are authoritative. When you hit "our" DNS with unbound, you are hitting our slaves first. To hit the master DNS, you need to sort the NSs and use dig to query the one with lowest value.

$ unbound-host -r -tNS example.com
example.com has NS record ns1.example.com. <----- master
example.com has NS record ns3.example.com.
example.com has NS record ns2.example.com.
example.com has NS record ns4.example.com.

JuergenAuer · February 25, 2021, 9:55pm

I know. That's the reason your setup is fatal.

PS: My own domains are using DNSSEC, I use INWX as dns provider.

My own client waits 6 minutes (one time 5 minutes were too short), then the dns validation (to create a wildcard) works. So the different name servers have the same result with a zone signing.

patch-work · February 25, 2021, 9:58pm

Our SOA complies with the limits set by RFC 1912, RFC 2308, RFC 4035, and the slave provider (transip.eu):

1200 ; SOA Refresh slaves must refresh (learn zone changes) after 1200--43200 seconds
7200 ; SOA Retry slaves must retry contacting master up to 120-7200 seconds
604800 ; SOA Expire slaves must revalidate after 604800--1209600 seconds
3600 ) ; SOA Minimum slaves must flush negative responses after 3600--86400 seconds

Today, the slaves updated at ..., 20:04, 20:25, 20:46, 21:06, ...

felixf · February 25, 2021, 10:07pm

I'm also using DNSSEC, and found from experience that waiting for 10 seconds after the API call for updating the TXT record returned suffices. (That's a specific delay for my provider, and potentially also for the simplicity of my zone, of course.)

Just configuring your client to wait long enough before telling Let's Encrypt to validate the challenges should solve this problem anyway. (Or even wait until all authoritative serves return the correct answer before validating the challenges, if you want to avoid failures because the slaves don't update... I'm sure there are ACME clients out there which already support such waiting out of the box.)

patch-work · February 25, 2021, 10:13pm

Our hook sleeps for 1200 seconds (=soarefresh) + 60 seconds.

Today, the renewal failed, because it could not find the second token in the DNS. We renew FQDN and *FQDN, so we receive two tokens, which means updating the DNSSEC two times. It takes one hour when it succeeds.

I do not see how you can do it in 5 minutes or 10 seconds without violating the RFCs.

patch-work · February 25, 2021, 10:16pm

I think your SOA violates the RFCs.

JuergenAuer · February 25, 2021, 10:22pm

May be. That's the difference between theory and reality

I've never configured these values.

patch-work · February 25, 2021, 10:23pm

I am sorry but Letsencrypt needs to respect the RFCs. It is not reasonable from you to demand otherwise. We all need to abide by the standards.

Can you just use dig to query the NS with lowest index? I am at pains here. I am renewing 7 certificates, each with a wildcard. This means 14 DNS updates just to serve the tokens, summing to 7 hours of hook time if all is well. It must be faster. If you bypass the slave updates, by querying the master DNS, then it could take seconds!!!

_az · February 25, 2021, 10:35pm

Why? Recursors don't behave this way. I think this discussion has taken place on this forum before and so far nobody has offered a reference shows they should behave otherwise.

Like you say, slaves are authoritative.

The (very common) way to deal with delays from slave-initiated zone transfers is to combine NOTIFY + AXFR, and things will only take seconds, once again.

JuergenAuer · February 25, 2021, 10:37pm

You have already the answer.

Different authoritative name servers with different answers -> fatal.

Then you have to use a client with a long-enough wait.

PS:

Why? It's a job, the job needs some time, that's all. It's not relevant if the job needs 70 seconds or 7 hours.

petercooperjr · February 25, 2021, 10:42pm

The point of the authentication that Let's Encrypt does is to confirm that the entity that holds a private key actually controls a name, as seen from everywhere on the Internet.

If some parts of the Internet see one thing, and some parts see another (as would happen if different authoritative servers disagree for the same domain), then you're not really fulfilling that requirement.

rg305 · February 26, 2021, 12:26am

If all DNS servers are authoritative, then the only logical (potentially enforceable) difference is their SOA record numbers.
If you first were to query them all and chose the one(s) with the highest number... [i.e. exclude those with lower numbers (as out-of-sync)] then you might be able to get the results you are looking for.

Until then, you need to just wait.

I don't mean wait for things to change, I mean wait 20 or 40 minutes and all your DNS servers should be in sync by then.

Edit: The security risks around allowing for a single authoritative DNS server are too big for that to ever happen. It would allow a single (spoofed) system to control your entire zone; simply by claiming to have the largest SOA record.

schoen · February 26, 2021, 2:57am

I think these timing rules are meant to reduce the frequency and likelihood of downstream users and clients seeing inconsistent or out-of-date results, not to require that downstream DNS users and clients themselves follow a particular strategy in performing queries.

The Let's Encrypt validation behavior is based on Let's Encrypt's own interpretation of other industry rules (coming from the CA/Browser Forum) about how to confirm that an applicant for a certificate really controls the domains that will be listed in that certificate. That's another way of saying more or less what @petercooperjr said:

Let me put that a third way: some implementers (like browsers) may prioritize following the "be liberal in what you accept" part of Jon Postel's dictum, because they place a high premium on maximizing compatibility and interoperability. But certificate authorities performing domain validation are automating a process with very high potential consequences and risk for relying parties, and so can't afford to default to being liberal in what they accept. A more frequently discussed example of this on this forum is the behavior about CAA validation.

In that case, Let's Encrypt also defaults to rejecting something that other applications that use DNS would accept. (And we can probably think of several further examples along these lines, like with the case-randomization thing. Not using that would be much better for compatibility and interoperability, and it's sometimes prevented people from getting certs, but it also serves a security goal of making it harder to misissue certificates.)

On the other hand, @_az's suggestion

is a reminder that Let's Encrypt client implementers are always interested in finding ways to make validation work in practice in more setups. I think you'll just find that this community's bias is always toward favor of finding ways to improve ACME clients in order to work around issues like this more reliably, rather than toward relaxing the CA's validation behavior.

JamesLE · February 26, 2021, 4:20am

Here's a possible workaround: you could use an NS record for your _acme-challenge DNS records, delegating them to only your authoritative DNS server(s) with the quickest updates.

That could be your own primary authoritative server under a different hostname, an acme-dns instance, or whatever else you like.

This way, you could keep the redundancy of multiple authoritative servers for all routine queries, but _acme-challenge would no longer depend on slow replicas.

felixf · February 26, 2021, 7:50am

I don't understand why you set the sleep hook to 21 minutes if it actually can take a lot longer until all authoriative DNS servers are updated. You need to wait long enough until all are updated, that should solve your problem.

felixf · February 26, 2021, 7:55am

Also, what stops you from starting the orders for all certificates at once, then update all DNS zones in one step, wait for the updates to reach every authoritative DNS server, and then ask Let's Encrypt (that's how it's spelled, not Letsencrypt or letsencrypt!) to validate all challenges at once? Even if you have to wait an hour for your DNS servers to update, it will only take a bit more than one hour to get all certificates in one step.

patch-work · February 26, 2021, 10:34am

Recursors don't behave this way.

I am asking you to stop using recursors in this procedure, because they are too slow.

NOTIFY + AXFR

No RFC mandates the use of NOTIFY.
Our provider does not support NOTIFY.

patch-work · February 26, 2021, 10:38am

Different authoritative name servers with different answers -> fatal.

You are not acknowledging the difference between master and zone laves. They are different when there is an update. The zone slaves update when the SOA in the master says so.

Good Internet citizens abide by the RFCs defining DNS.

Are you a good Internet citizen?

Topic		Replies	Views
Query primary NS Feature Requests	8	922	June 12, 2022
Let's Encrypt DNS lookups Help	9	3272	December 8, 2019
DNS Problem: SERVFAIL Help	8	1754	June 11, 2021
Suddenly unable to refresh certs or get new certs Help	4	223	July 25, 2024
Failed to update certificate Help	7	638	October 19, 2022

Please query the authoritative DNS(SEC) with dns-01

Related topics