Concurrent issuances with DNS-01 challenge

Hi! I was hoping to get some guidance for how to ensure that an issuance using DNS-01 challenge succeeds in circumstances where there can be more than one account requesting a certificate for the same DNS name within a short time window.

We have a setup where, in some cases, a few clients with separate accounts request the same certificate using DNS-01 challenge and do so concurrently or within a short time period (this happens on Kubernetes clusters where an app is deployed redundantly across two clusters and a user would apply the same config from their CI to both clusters at the same-ish time).
In this case, each client sends a request to our centralized control plane that creates a new TXT record with our DNS provider and waits for the record to be propagated to our provider's nameservers before accepting the challenge. In this setup, our DNS provider will have multiple TXT records for the same name with different values (cleanup will happen asynchronously at some later point).

In this setup we frequently see that one of the authorizations for one the requests fails with During secondary validation: Incorrect TXT record.. and it appears that Let'sEncrypt did not have the required challenge record value although we had validated propagation to our DNS provider's nameservers. A subsequent retry after a short backoff period would succeed. Note that we primarily test this against the staging environment, but have also seen this issue in prod.

Our TXT records currently have a 5 minute TTL. If I understand correctly Let'sEncrypt uses Unbound's recursive resolver, which also can cache results- is it possible that old TXT record values are cached for a period of time because of our TTL? I was trying to understand if Let'sEncrypt configures Unbound to cache results or not and I found a few answers that seemed conflicting with regards to whether there is caching or whether authoritative nameservers are always queried.

Our DNS provider does also use Anycast and I was suspecting that perhaps when we check propagation we just hit different server instance than Let'sEncrypt would. However, I tried to query a challenge record using unboundtest.com, I saw cases where unboundtest has the correct TXT record (after our propagaton check), but Let'sEncrypt validation still fails, which made me think that perhaps there is some caching Let'sEncyrpt side?
Additionally, it seems that we only ever experience this issue in cases where there are concurrent issuances.

The most common issue is that changes have not replicated to all nameservers, and Let's Encrypt will check more than one and inevitably there will be conflicting answers due to that. The solution there is to wait long enough between updating the TXT record and submitting the challenge for validation.

The second problem is the simultaneous validation with the same record, so the options there are to append to the TXT recordset instead of replacing it, allowing (for instance) 10 values instead of just one, rotating the first values out and appending the latest values as they are updated.

You should probably also have a singleton service for performing the updates, which processes the request from queue, because the alternative would potentially be an unpredictable free for all.

If you share ACME account across your clients they will benefit from cached validations, but also consider whether a central process should be in charge of renewals and individual containers should fetch latest from a secrets store rather than performing an actual ACME renewal, if that's not what you're currently doing.

4 Likes

Thank you!

I mostly would like to find out whether Let'sEncrypt caches the results and whether the TTL on our records matter.

We do wait for propagation to our provider's nameservers.
Sharing the account keys is something we'd like to strive for, but this won't always be possible (we build for external users who may or may not have a secret share mechanism available etc).

TTL doesn't matter (that's used for non-authoritative cached responses e..g. your local network DNS caching) and Let's Encrypt doesn't cache previous DNS query results as far as I'm aware.

Check your updates can't clobber previous updates. I assume these request are all to get the same domains on a cert (just across different instances) and you're not reusing one _acme-challenge record for many domains somehow.

2 Likes

So, my assumption was that the TXT query results are cached by Let'Encrypt's Unbound recursive resolvers Let'sEncrypt side - from looking at this Caching of DNS-01 for domain and its wildcard · Issue #5820 · letsencrypt/boulder · GitHub

I could be wrong

AFAIK only for a few seconds or minutes, not very long.

3 Likes

My last info on this is that LE does have a small cache in unbound to avoid flooding servers with DNS queries. It should cap all TTLs to not more than 1 minute, i.e. longer TTLs are ignored.

This sounds like all you need to do is to add a small delay - like 1 minute - before attempting validation after provisioning the TXT record. It would help with any kind of propagation delay, wherever that occurs.

"During secondary validation" is always interesting and means that the primary lookup from LE succeeded, but a (parallel) lookup from another server in another datacenter - sometimes located on the other side of the world - did not succeed. This is usually due to propagation delays within the provider's anycast network.

8 Likes

Thank you very much.

We'll experiment with shorter TTL's and if that does not help then with a static wait period to help with anycast propagation and I post back once we see which solution worked.

You cannot practically check propagation, because it would require presence of a validator at each anycast instance. Instead, I suggest you to rely on the API provided by the DNS service that reports the success of the change propagation to every anycast instance.

For example, in case of Route53, you can use route53 get-change in a loop, or block on route53 wait resource-record-sets-changed.

2 Likes

That would be great, but our DNS provider currently does not have an API endpoint for this.

Ask for an upgrade of their software to provide that API.

2 Likes

IMHO, I think a better solution would be using cloud storage or having the deployment tool centrally obtain and deploy the certificate. There shouldn't be a need for each cluster to obtain it's own certificate, and deployments like that often break and wedge an account or domain into a rate-limited state.

First, I highly recommend using acme-dns (GitHub - joohoi/acme-dns: Limited DNS server with RESTful HTTP API to handle ACME DNS challenges easily and securely.) instead of commercial vendors or other in-house solutions.

If you're using a commercial vendor or in-house, drop the TTL to the lowest amount possible; 60s or under is best.

What I've experienced in the past, is that commercial providers often have an internally tiered system for DNS that is used for replication. The external facing DNS servers query an internal DNS server, which cascades inwards to other DNS servers, caches and the database.

  • API/Web edits the database record, might update DNS servers or Application Cache
  • Internal DNS servers pull from the Database or a Read-through Application Cache
  • External facing DNS servers pull from the internal DNS servers

The problems this creates:

  • Providers have a read-through Application cache - not write-through. You have to wait until the internally cached value expires before the internal DNS systems pull the new record out. The application cache is not configured to expire on TTL, and is hardcoded to 60s or 300s.
  • The internal DNS servers respect TTL, so an application cache and first internal DNS server can be updated, but the data stays cached between the first internal DNS server and the external facing server. You need to wait until they all time out.

Layering Anycast on top of this just sounds like a headache.

I remember with one specific vendor, even with a 60s DNS change, you had to wait 301 seconds for a DNS entry to definitely change. Polling before then would just wedge stale data into the cache for even longer.

IMHO, I would set up a dedicated NS for your authorizations, install acme-dns onto it, and have clients delegate/CNAME their _acme-challenge records onto that instance.

3 Likes

Just to clarify the above:

1- Your clients would only create an initial CNAME onto your DNS system. You could use anything, I like acme-dns. acme-dns creates a random subdomain for each credential set, but I wrote a script that lets you update the sqlite3 datastore to have a custom domain. This allows you to predict/assign what a customer should CNAME onto. i.e. if your authorization NS is auth.irbe.example.com, then _acme_challenge.example.com cnames to example.com.auth.irbe.example.com

2- You only need a single DNS entry for your authorization zone, no roundrobin or global cluster coordination or need for redunancy. If it's temporarily down or there are networking issues, just try a few hours later. This is why ACME is supposed to renew at 2/3 the lifetime - to take into account these potential ephemeral downtimes. Avoiding a "single point of failure" for this is not necessary. If you have any issues, there are days to weeks for triage.

2 Likes

Thank you, that's a really neat approach. I wasn't aware of joohoi/acme-dns. I think it will not work in our particular setup - our clients each get a subdomain under our TLD and they can then choose to provision certs for DNS name(s) within this subdomain (subject to rate limits and some other validation steps). We don't have a clear indicator what the DNS names that they will want a cert for will be before a cert is actually requested, so the CNAME provisioning might not be feasible for this setup, we'd essentially need to alias _acme-challenge.*.customerN.ourdomain. That said, being able to provision more fine grained credentials and being able to have a simpler DNS server for this particular task only would be awesome

1 Like