FYI, edge case regarding race conditions ACME Orders and Authorizations to test/defend against

jvanasco · December 18, 2024, 6:38pm

This surfaced while running some tests in parallel against staging; it does not happen in Pebble as pebble does not recycle authorizations. It's best to show in pseudocode:

Create AcmeOrder#1 for example.com and dev.example.com

Receive AcmeAuthz#1 for example.com
Receive AcmeAuthz#2 for dev.example.com

Create AcmeOrder#2 for example.com

Receive AcmeAuthz#1 for example.com [recycled]

The Process handling AcmeOrder#2 deactivates AcmeAuthz#1
- Acme Server deactivates AcmeAuthz#1
- Acme Server invalidates AcmeOrder#1
- Acme Server invalidates AcmeOrder#2

This leaves AcmeAuthz#2 pending. The same situation should happen with failed challenges.

I usually defend against this by refusing to create an order if there are pending challenges (allowing that would break the ability to correctly respond), but I had a race condition where AcmeOrder#2 was created before AcmeOrder#1 downloaded all of the Authorizations. I'm going to need to use placeholders or similar to defend against that.

While ACME Clients should eventually handle this inherently via Authz cleanups, in my use case, the code was iterating the Authz to manually trigger them. I'm not yet sure how I'll handle this; I can do nothing for now as the 30day caching of successful challenges leaves more than enough time to retry the order -- but if LetsEncrypt drops the Authz caching to a small enough interval, I'll have to sync the order against the AcmeServer before/after each Authz, otherwise we'd be wasting time completing challenges that are likely to be discared.

bruncsak · December 18, 2024, 6:58pm

Does the Acme Server invalidate the AcmeAuthz#2 at the end of step number 3, because the number of order owner object associated to AcmeAuthz#2 is equal to zero?

Nummer378 · December 18, 2024, 6:58pm

Clarification: Pebble does reuse authorizations, but only valid ones. You appear to be referring to pending authorizations, which indeed do not appear to be reused.

Valid Authorization Reuse

Pebble will reuse valid authorizations in new orders, if they exist, 50% of the time.

The percentage may be controlled with the environment variable PEBBLE_AUTHZREUSE, e.g. to always reuse authorizations:

PEBBLE_AUTHZREUSE=100 pebble

Pending Authorization Reuse

Pebble does not currently reuse Pending Authorizations across Orders, however other ACME servers - notably Boulder - will reuse Pending Authorizations.

jvanasco · December 18, 2024, 7:43pm

No. It is left as a pending authz, which is expected (at least in Boulder). This is just a variant of the need to clear pending authorizations on a normal order failure, but it is surfacing across two different orders due to a race condition on the client. Authz - pending and validated - are the intersection of an Account+FQDN and have a one-to-many relation to Orders.

Yes. I was hoping my use of the term recycle would make that distinction clear.

Internally, Pebble and Boulder will "reuse" valid Authz within the cache period.

Externally (e.g. from the Client perspective), Boulder will "recycle" unused Authz across orders. If you submit a new ACME order that has a mixture of valid and pending authz, the authorizations in the AcmeOrder (returned to the client) will only be the "recycled" pending authz. The valid authz are omitted from the order object - so a "reissue" certificate will often have no authz/challenges from the client perspective.

aarongable · December 18, 2024, 11:24pm

There's no inherent need to clean up pending authorizations. In fact, doing so prevents us from reusing AcmeAuthz#2 when your client next creates AcmeOrder#3 containing the same identifier. What value does it bring your client to refuse to create an order if there are any pending authorizations that you haven't cleaned up?

Nummer378 · December 18, 2024, 11:58pm

I think the worry is that you can run into a "too many pending authorizations" rate limit if you have many orders and some of them fail, but not all. The client will give up on the order after the first failed authz, leaving other authzs pending, which can then accumulate?

(Though if the CA reliably reuses pending authzs they will probably be reused for the next order, but there appears to be some way to accumulate pending authzs)

aarongable · December 19, 2024, 12:17am

Note that the "too many pending authorizations" limit no longer exists.

petercooperjr · December 19, 2024, 12:23am

That might have been worthy of an API Announcement post, too.

Though other CAs, like Buypass can and do have such limits still.

jvanasco · December 19, 2024, 12:26am

My client implements this as a configurable option. This is a legacy behavior from when multi-san certs were recommended, and accounts would quickly get wedged through pending auths. This only happened to me twice, but it was very common to early users of multi-san certs. If you fail 10 domains into a 100 domain order, you're left with 90 pending authz. These add up very quickly and an account can get wedged with a few orders.

In our usage - and the pattern of many automated systems - the failed order will not be immediately audited and retried. The system will jump to the next designated order(s). My own usage for my client is for a whitelabel cloud system that clients CNAME onto; a failure on an order triggers an alert for checks and an audit - but that order isn't being repeated for a while.

The re-use is also an implementation detail of LetsEncrypt, and may not happen on other CAs. I have a ticket on my stack to grab backup certs from another CA, like cloudflare does.

For HTTP-01 it's not an issue, but for DNS-01 there are often limited records. With DNS-01 I also try to do everything through a delegation to acme-dns, so I've only got 2 entries at a time.

I fail orders on pending authz, because the next order could be against another CA, or on another account with the same CA (for example, when a second dedicated account was necessary with LE for ECDSA certs).

Is this necessary? No, but it's the easiest way to avoid problems. The tickets to "do this right" are pretty heavy in points and there is very little benefit, so it gets pushed back.

Great! I can easily disable automatic cleanups by CA.

webprofusion · December 19, 2024, 2:59am

As an aside I see that boulder is going to implement rate limit info as metadata headers, which sounds very useful.

system · January 18, 2025, 2:59am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The uniqueness of challenges (Boulder specific) Client dev	3	783	May 23, 2020
Can someone please confirm a few behaviors of the spec regarding failed challenges for me? Client dev	4	734	March 13, 2020
Boulder: Order <> Authorization Relationship Client dev	15	1074	October 21, 2020
Pending, pending, … suddenly valid? Client dev	14	1725	April 17, 2020
Http Challenge failure Client dev	23	1670	September 16, 2021

FYI, edge case regarding race conditions ACME Orders and Authorizations to test/defend against

Valid Authorization Reuse

Pending Authorization Reuse

Related topics