DNS-01 validation: what about a 'race condition'?

I have been wondering. Suppose I have two servers that each require the same wildcard certificate for *.example.com,example.com. I need DNS-01 validation for that and for that nonces are written into _acme-challenge.example.com which is CNAME'd to <subdomainkey>.<acmednssubdomain>.example.com where <acmednssubdomain>.example.com has a NS record point to the DNS server that handles the TXT query (in my case my own self-hosted acme-dns service).

But what happens when both servers concurrently make a request? The CA delivers a different nonce to both and both certbot-or-whatevers write this in the TXT record for the CA to check, one overwriting the other?

I can't have two different CNAMEs from _acme-challenge.example.com.
I can give the different systems access to the same credentials of the DNS to put the TXT records in during validation — in my case acme-dns — but that produces a race condition in those validations
Unless LE CA handles this by noticing concurrent requests and sending identical nonces to both validation requesters. That seems unlikely.

I can of course make sure I renew them about a month apart, which makes the chance of a clash small. So far, I haven't encountered this race condition since I have been using DNS-01 validation for a wildcard (GoDaddy API until recently when they shut it down for everyone with less than 50 domains), but I am curious what the mechanism is.

I must add that I quite of like renewal of the certs on different servers being out of sync in renewal. It saved me just this month when GoDaddy pulled that stunt. One ran out, but I could (until I had fixed renewal/issuing) copy the other one over by hand to keep the system alive. So, the chance to run into this will be small.

In your example, there is only one CNAME for _acme-challenge.example.com pointing to the fulldomain field from the first acme-dns registration.

You would need to copy or sync that registration info with both servers. When they each renew, they'll each get a different set of challenge data to publish and they'll publish all 4 records to that single acme-dns registration.

Some might argue that since you're already sync'ing the acme-dns registration data, you may also want to sync the ACME account info between the servers as well a minor optimization. If you then stagger the renewal schedules it would mean the second server won't be challenged if the first server was successful due to validation caching on the CA side. (Note: CA validation caching is not officially part of the spec and shouldn't be relied upon to work the same way forever)

3 Likes

DNS perfectly allows multiple TXT RRs present for the same hostname.

Also, a "nonce" is something different than what to put in the TXT RR. Officially it's the key authorization of the token that goes into the TXT RR, but often it's just called "token". (Although the latter is technically also incorrect.) The term "nonce" is used in the lower level protocol communications to protect agains replay attacks.

1 Like

Aha. Thank you. I was planning to use 'token' but the debug log showed 'nonce' (which I know from e.g. blockchain) so I copied that without looking to deeply. Silly me.

And of course, I should have guessed that. the LE CA will check all available TXT RRs.

But acme-dns seems to keep/deliver only the last two. That is of course mostly enough until you have three systems doing exactly the same domain validation concurrently. I have three systems, actually.

1 Like

You may want to reconsider getting a cert from each one. You could setup a dedicated server to acquire a cert. Have each of your other systems get it from your own (localized) storage.

That wildcard cert needs 2 TXT record values.

3 Likes

Aha. Setting up a dedicated server with distribution is indeed the nicest solution but I am a bit afraid what to take to do that in an actually secure manner. I'll have to make sure I separate the renewals, easiest is to make sure they run at a different time in the day.

1 Like

Another issue now that you have 3 servers is there is a current Rate Limit of 5 identical certs / week / account.

So, if all 3 got a fresh cert on one day. And, for some reason they all needed another cert the same week one of the requests would be denied. Maybe unlikely but maybe some serious outage / restoration. I don't know your hardware setup of course to evaluate but thought I'd mention it.

2 Likes

If the 3 different servers have their own specific hostname included in the SAN, it wouldn't count as identical certs :slight_smile:

Agreed. But they said they get the same identical cert so ...

I think a better solution overall is for them to have a dedicated service for the certs.

2 Likes

Tangent: Boulder had (still has?) a limit of 4096 bytes for a TXT record response and will ignore the rest. People (like @rmbolger) have tested this to fit 60-70 challenge responses (see Raise the cap for TXT records per subdomain · Issue #76 · joohoi/acme-dns · GitHub)

See also an explanation in this thread: Limitation of TXT record response SIZE

And also @Osiris comments on the boulder code here:

(I was searching the forum because I recalled one of the ISRG staff posting about tweaking some of that response logic a few years ago -- maybe Matthew McPherrin -- but couldn't find it)

3 Likes

Sure, there are limits, but that limit is not 1 :wink:

1 Like

Unless scale is needed, I think that might be overkill. Caddy and Certmagic both support cloud storage (and I believe coordination for race conditions); several other clients do as well - though I don't recall which ones. Everything in Go built on top of Certmagic (including Caddy) does, but that has been implemented in other programming frameworks and clients as well.

3 Likes

:grinning: Indeed. That's a good tip. Outage of one system is not a big issue. One is a main server, the second is a backup/failover, the third is the router itself, so just a nuisance. I might add the VM host for the backup server (self-signed now), so that would become four.

Some years back I developed a MacOS X Server deployhook and in support for that I got my limit lifted then (as this had to be tested in a production setting), that might still stand even... But frankly, I won't really hit that limit if it has been reinstated (I suspect it has been).

If the 3 different servers have their own specific hostname included in the SAN, it wouldn't count as identical certs :slight_smile:

Ah well, but if the wildcard is part of the set it won't work. I cannot add foo.example.com to *.example.com to turn it unique:

Error creating new order :: Domain name "foo.example.com" is redundant with a wildcard domain in the same request. Remove one or the other from the certificate request.

1 Like

But if you add another label, e.g. server1.static.example.com it'll work again. I just made up static in 1 second. Any label is possible. As long as there are two, as a * will only catch a single label.

1 Like

I see. Doesn't work for me yet, but that is initially the plugin I am using (certbot-dns-acmedns) that is throwing up problems (complains it doesn't have an entry for server1.static.example.com in its config, maybe I'll clone that plugin, it's not really actively maintained I suspect). This approach also requires _acme-challenge.server1.static.example.com in my public DNS I think. Will investigate later. Thanks.

1 Like

Validation is allowed to fail, so clashes between multiple simultaneous renewals are fine. Your entire certificate request/renewal process must tolerate regular failures. So your ACME tool will attempt renewal when it thinks it needs to and if that fails it will try again later.

What will happen with Let's Encrypt is that some parts of your validation will succeed, and they remember that for at least a few days (30 days currently believe), so your other validations will eventually pass and the certificate order will complete.

Some ACME clients and DNS APIs allow for managing a round-robin batch (e.g. first in, first out) of TXT record values, some don't. Those that don't may fail on first attempts and succeed on next or future attempts. acme-dns allows 2 TXT records but you could customize that to allow more.

Ideally primary domains and wildcards wouldn't have used the same record name - I'd have preferred something like __acmechallenge specifically for wildcard challenge responses rather than having the name clash, but that's all settled long ago.

There is some proposed work in progress to allow different acme accounts to perform different dns challenges without clashes : draft-ietf-acme-scoped-dns-challenges-00 - Automated Certificate Management Environment (ACME) Scoped DNS Challenges - and example of a real world clash is Cloudflare attempting to use invisible _acme-challenge records for DNS validation, which you can't delete, not sure if they still do that though.

3 Likes