Can ARI-conforming clients be granted exemptions to relevant rate limits?

mholt · March 31, 2023, 6:22pm

As I am implementing ARI I have another concern/question I was hoping to get some reassurance on.

Some Caddy/CertMagic instances manage tens of thousands of certificates, similar to Chris' situation with Certify the Web:

With our current logic we're able to mostly manage this many certificates within rate limits, but it involves spreading out renewals over a longer period of time to squeeze all comfortably within the 90-day window (60 if possible).

My concern with ARI is that there is the potential that certificates will be unable to be renewed before expiration due to waiting for the ARI window before starting to try renewal. Of course we back off and retry when there are errors, but this can sometimes last for days and weeks as rate limits are hit. Hence, starting renewals sooner to spread them out more. But if we conform to ARI we cannot start sooner.

Also, retries will often end up going outside the ARI window anyway, as sometimes they take several days or weeks before succeeding. I know the spec says clients should follow our normal backoff and retry logic, but then what's the point of the "end" timestamp? If we're going to go outside the window, we might as well have started renewals earlier with higher chance of getting the cert before expiration.

I can think of a couple possibilities so far:

Ignore ARI and start renewing certificates as early as needed to be able to spread them out enough.
Get a guarantee from the CA that the first successful cert for a name within the ARI window will be allowed regardless of relevant rate limits (new orders for example).

As a client dev, I'd prefer the latter. Since the point of ARI is to make sure the CA isn't overwhelmed in the first place, I don't see the value in the New Orders rate limit for the first (successful) issuance of a cert within the ARI window. Basically, clients should be rewarded for conforming to ARI, not punished, especially when operating at scale, where load smoothing really matters and we are doing the CA a favor.

Thanks for considering!

Osiris · March 31, 2023, 6:47pm

What rate limit for renewals are you afraid of exactly?

rg305 · March 31, 2023, 6:55pm

ARI can equal = I'm busy right now, please come back "later".
Well, then "later" could just become an even busier time to get all the certs done in time.
Things can only be put off for tomorrow so much - eventually "tomorrow" comes.

Osiris · March 31, 2023, 6:58pm

I know what ARI entails. But I'm preeeetty sure the Let's Encrypt validation servers don't want to amass all certificate renewals till a later time: Boulder wants all the certs renewed too, in an orderly fashion and as soon as possible I assume.

So the interests of the ACME client are in the interests of the ACME server too, it's just some load displacement when loads are high.

Personally I don't really see the fuss, but I might be blind for it.

rg305 · March 31, 2023, 6:58pm

^^ devil's advocate ^^

Ideally ARI wouldn't defer an entire accounts' requested certs for any significant amount of time.
I would hope no more than minutes [not even hours] of gaps.

Osiris · March 31, 2023, 7:00pm

And even then: renewals are exempted from the "certs per registered domain per week" rate limit. The only rate limit relevant for renewals is the duplicate rate limit of 5 per week. Which isn't really an issue with regard to ARI.

rg305 · March 31, 2023, 7:01pm

hmm...
much ado about nothing then.

Osiris · March 31, 2023, 7:02pm

I also don't understand this part. "First successful cert for a name"? What name? Howcome "first"? How many certs would you want for a "name"?

mholt · March 31, 2023, 7:03pm

Like I said in the topic, the main rate limit that gets in the way of large deployments is New Orders. 300 per 3 hours (or ~100/hour) only allows 2400 per day, assuming no backoff (which there is, to be nice). This is problematic because new certificates aren't always spaced out so perfectly. Hence the start-early-and-backoff logic.

mholt · March 31, 2023, 7:04pm

A domain name. And by "first" I mean the only renewal you ought to need within the given window. Repeated renewals can still be subject to rate limits to prevent abuse or bugs.

Osiris · March 31, 2023, 7:12pm

Why would you want multiple certs issued for the same (set of) domain name(s) anyway? That's just wrong in my opinion.

Anyway, my guts tell me this is more of a theoretical issue than an actually practical one, as ARI is used to smooth things out on the ACME server side and I don't really see how that would interfer much with the ACME clients side. If you look at the example in draft-ietf-acme-ari you see a window of 4 days. With a recommendation (so not a MUST or SHOULD) to renew at a random time within that window. But as stated there's nothing wrong with developing your own algorithm to accomodate appropriate renewal within the suggested ARI time window. That said, those 4 days are just an example, Let's Encrypt could recommend a single hour as the window for example..

mholt · March 31, 2023, 7:22pm

You don't. I'm not sure where you got that idea?

Osiris · March 31, 2023, 7:38pm

If you're talking about "first", I assume there's gonna be a second? I just don't really understand what you were talking about earlier with the whole "First successful cert for a name". As it implies a second cert.. And were were talking about renewals. So that would be a duplicate cert?

mholt · March 31, 2023, 7:46pm

Only 1 is needed in that window, since there is only 1 cert that window applies to. If there's a second, that's either a bug or the beginning of abuse. Hence the "first" should be exempted from rate limits.

Osiris · March 31, 2023, 7:48pm

Yeah, no, nevermind. I still don't get the actual issue and from which rate limit that "first" (i.e.: regular) renewal should be exempted from. I'll leave it to other users to discuss

Nummer378 · March 31, 2023, 8:02pm

During past mass-revocation events, Let's Encrypt has temporarily adjusted or removed rate limits.

For example, during the TLS-ALPN revocations in January 2022, the New Orders per 3 hours was raised to 1000 orders per 3 hours:

Large integrators (who are most likely to be affected by rate limits during revocation events) were advised to contact Let's Encrypt.

If you regularly - outside of mass-revocation events - exceed the rate limits, the subscriber needs to apply for a rate limit override form anyway. This is how it's been handled before ARI and I personally don't see that changing with ARI.

mholt · March 31, 2023, 8:18pm

Well, these users don't normally exceed the rate limits, because they can simply start early.

We now need to make a decision: whether to ignore ARI, or exceed rate limits.

I've said "New Orders" 4 times now, but I guess it doesn't matter at this point. Clearly you haven't dealt with cert deployments at scale.

Nummer378 · March 31, 2023, 8:23pm

ARI is a suggestion. The renewal time given by it is not a requirement in any way. Rather it's what the CA recommends for uninterrupted communication (i.e. advisory of an impeding revocation).

Before ARI, Let's Encrypt suggested subscribers to always renew after 2/3 of the certs lifetime has elapsed. If you were already spreading out your renewals over a much larger interval, you were already ignoring recommendations: In this case ignoring ARI is the logical continuation of this approach.

You can still utilize ARI to be notified of impeding revocations and perhaps for load avoidance, but in general your setup sounds incompatible with the ARI suggested window.

Osiris · March 31, 2023, 8:23pm

I have not, but I've also stated a few times that one of the goals of ARI is to smooth out renewals on the ACME server side, so I'm wondering why it would give you more trouble on the ACME client side than you currently already have.

And I've also talked about a fairly generous window example in the RFC draft of 4 days. So I'm also wondering why that would be an issue for you. What kind of windows does the Let's Encrypt production environment currently suggest? Let's start with that. IMO it doesn't make much sense to discuss a problem that doesn't actually exist.

mholt · March 31, 2023, 8:31pm

I think this conversation has told me enough. My plan now is to check ARI and use it as a hint to start renewing right away if the window has been changed. (Since the window doesn't actually matter after all, but is at least a signal for potential problems.)

Topic		Replies	Views
Thoughts from starting to play with ARI Client dev	135	4711	February 16, 2025
Public beta rate limits Issuance Tech	131	63391	December 22, 2016
New rate limit question	22	8719	April 19, 2016
Cannot renew certificate "too many requests" Help	20	28224	July 14, 2016
Client Burden of Preemptive ARI Timing Selection Client dev	20	390	July 29, 2025

Can ARI-conforming clients be granted exemptions to relevant rate limits?

Related topics