Unable to issue ECDSA+RSA in ACMEv2 staging environment

Hi @jvgutierrez,

Thanks for the detailed follow-ups. I think I understand what's happening here and can explain.

Both our staging and production environments use recursive resolvers with a max cache TTL of 60s.

What's happening here is a combination of:

  1. quickly back-to-back issuing two identical certificates (modulo the different public key/alg)
  2. authorization reuse being disabled
  3. the cache max TTL.

Because the certificate subjects are the same between the two orders the DNS-01 challenge lookups will be for the same DNS records. That means if the second issuance happens within 60s we'll be checking our cache and not your authoritative server and will see the wrong keyauth.

So end-to-end it looks something like this:

When authorization re-use is enabled everything works fine:

  • the first order is created, and unique dns-01 challenge tokens provisioned
  • the key authzs gets provisioned into the DNS zone. Self-checks sees them.
  • the challenges are initiated, we do TXT lookups and get the correct key authzs
  • the order is fully authorized and a certificate is issued based on the CSR
  • a new order is created, with the same names
  • the valid DNS-01 challenges from the first order are reused, no new tokens/challenges.
  • no challenges are initiated. No TXT lookups are performed.
  • the order is fully authorized and a certificate is issued based on the CSR.

When authorization re-use is disabled the combination of identical names & the max TTL break the second order:

  • the first order is created, and unique dns-01 challenge tokens provisioned
  • the key authzs gets provisioned into the DNS zone. Self-checks sees them.
  • the challenges are initiated, we do TXT lookups and get the correct key authzs
  • the order is fully authorized and a certificate is issued based on the CSR
  • a new order is created, with the same names
  • with no reuse fresh pending authorizations for the names are created and fresh dns-01 challenge tokens are provisioned.
  • the challenges are initiated, and TXT lookups performed. Because the identifiers match between order 1 and 2 the lookups will be for the same DNS records as the initial order. If this happens within <60 (the max cache TTL on our end) the first DNS-01 key authorizations are seen, not the new ones.
  • The order fails to be authorized.

Not ideal! :-/

You could "solve" the problem by adding an artificial sleep longer than our max TTL and it should work fairly reliably.

A better idea might be to check if you can explicitly set a very low TTL on the TXT records your ACME client provisions in the zone. We should respect the TTL you send if its lower than 60s. We allow a min TTL of 0s and I think setting your TXT records with that TTL will solve the problem as well.

I was out of the office yesterday but today I'll follow up about the authorization reuse in the staging environment and what our plan is.

4 Likes