Client Burden of Preemptive ARI Timing Selection

A new mechanism to ask the ACME2 server for a window within which to renew a certain valid certificate is called ACME Renewal Information (ARI).

The following source identifies a protocol transition in late April 2024 from draft-ietf-acme-ari-01 to draft-ietf-acme-ari-03: LE user ‘beautifulentropy’, “Discontinuing support for ACME clients using draft-ietf-acme-ari-01”, 19 Mar 2024, Discontinuing support for ACME clients using draft-ietf-acme-ari-01 .

The ARI-03 protocol is described in: Samantha Frank, webpage “An Engineer’s Guide to Integrating ARI into Existing ACME Clients”, 25 April 2024, An Engineer’s Guide to Integrating ARI into Existing ACME Clients - Let's Encrypt .

RFC 9773, “ACME Renewal Information (ARI) Extension”, June 2025 appears to be ARI, version 3.

The ARI-03 protocol is concerned about uniformly spreading out the load on the Certificate Authorities (CAs) ACME[2] server. I don't object to that idea in isolation, but in the real world of conflicting social objectives that goal seems to me to be unreasonably burdensome to client software design and implementation. This suggests to me the common pasture problem. CA rate limits will not be obviated by ARI or communal cooperation in an actual ecosystem.

Perhaps you have read some version of the online essay or ebook: Eric S. Raymond, “The Cathedral and the Bazaar”. (A source: The Cathedral and the Bazaar)

I am trying to understand how my personal ACME[2] client could use ARI. I have two design issues with the algorithm for time selection. I am aware that I am not a very knowledgeable programmer and ask for thoughtful philosophical or practical opinions. I have not (yet?) read all of RFC 9773. I have read all of Frank, “An Engineer’s Guide to Integrating ARI into Existing ACME Clients”. When I did, I had design concerns only with the section: “Step 5: Selecting a specific renewal time”.

Section “Step 5: Selecting a specific renewal time”:
draft-ietf-acme-ari provides a suggested algorithm for determining when to renew a certificate. This algorithm is not mandatory, but it is recommended.

  1. Select a uniform random time within the suggested window.
  2. If the selected time is in the past, attempt renewal immediately.
  3. Otherwise, if the client can schedule itself to attempt renewal at exactly the selected time, do so.
  4. Otherwise, if the selected time is before the next time that the client would wake up normally, attempt renewal immediately.
  5. Otherwise, sleep until the next normal wake time, re-check ARI, and return to “1.”

For Lego, we implemented the above logic in the following function:

func (r *RenewalInfoResponse) ShouldRenewAt(now time.Time, willingToSleep time.Duration) *time.Time {

  // Explicitly convert all times to UTC.
  now = now.UTC()
  start := r.SuggestedWindow.Start.UTC()
  end := r.SuggestedWindow.End.UTC()

  // Select a uniform random time within the suggested window.
  window := end.Sub(start)
  randomDuration := time.Duration(rand.Int63n(int64(window)))
  rt := start.Add(randomDuration)

  // If the selected time is in the past, attempt renewal immediately.
  if rt.Before(now) {
    return &now
  }

  // Otherwise, if the client can schedule itself to attempt renewal at exactly the selected time, do so.
  willingToSleepUntil := now.Add(willingToSleep)
  if willingToSleepUntil.After(rt) || willingToSleepUntil.Equal(rt) {
    return &rt
  }

  // TODO: Otherwise, if the selected time is before the next time that the client would wake up normally, attempt renewal immediately.

  // Otherwise, sleep until the next normal wake time.

  return nil

}

I notice the phrase ‘not mandatory’. RFC 9776 has this:

Clients MUST attempt renewal at a time of their choosing based on the suggested renewal window. The following algorithm is RECOMMENDED for choosing a renewal time:

Notice two details of the example implementation from Frank, Step 5:

(1) Step 3 of the algorithm in Frank, the section for Step 5 is ‘if the client can schedule itself to attempt renewal at exactly the selected time, do so’. Without a master process or thread controlling worker processes or threads, the only ways to schedule for the exact time are to ‘tell the scheduler how to schedule’ or sleep the current process. The example seems to sleep the current process.

I don't want to integrate a scheduler into my ACME[2] client for what should be obvious reasons, or you know something I don't know, and that I would like to know. I don't want to sleep my client process because it is doing all the renewals in its purview in sequence. ARI is designed to address security breaches. Maybe a certification to be considered by my client for renewal has an a security issue more dire than the issue of the currently considered certification.

(2) Step 4 of Frank, the section for Step 5 is ‘if the selected time is before the next time that the client would wake up normally, attempt renewal immediately’. How does the code know what the time of the next scheduled run is? Do I query systemd? Do you query something else? A second source of truth that is easily outdated by changing the scheduling per the scheduler program of choice?

Step 5 of Frank, the section for Step 5 is ‘sleep until the next normal wake time, re-check ARI, and return to “1.”’ That I can do. However, the corresponding step in RFC 9776 is ‘sleep until the time indicated by the Retry-After header and return to Step 1’. I have an issue with sleeping the client program as noted.

All of my concerns have to do with futuristic computations and scheduling changes. My design intentions are to have a scheduling-ignorant client. If I omit preemptively immediate invocation and invocation scheduling for a particular certification, which I would do with my client software, then I am potentially late according to the AC's renewal window that is potentially not yet begun and is certainly not yet over.

Contravening that tendency to be late to renew per the ARI window is the time qualifier of my (the client's) choosing that could select a certificate for renewal at the time of the current invocation of the client software. Additionally, the futuristic window overshoot is mitigated by more frequent regular invocations of the ACME[2] client software. The impending transition from 90-day end-entity certificates to 6-day end-entity certificates necessitates regular client invocations with intervals of no more than 6 days. Daily invocation seems likely. What kind of problem would certificate renewal after the close of a futuristically specified window period terminus of no more than one or six days be?

I don't think the anticipatory renewal windows given by CAs are so precise that integration with a scheduler or managed worker threads are justified. I find it hard to believe that I see these issues as onerous and no one else does, but then I have not found any online indications that my concerns are anyone else's concerns. Shall I just use the big client software made by somebody else? Are ACME[2] clients to be like browsers?

If you program ACME[2] clients or servers, what are your software design intentions regarding futuristic ARI? I am trying to discern by survey the developmental direction of the conventional design features of ACME[2] client software. I wonder if my client software is just a dinosaur facing elimination in a mass extinction event now barely visible on the horizon.

The most naive solution is to check ARI, and if you’re in the current window or it is in the past, renew.

There are two risks with this approach:

  1. If the certificate is about to expire or be revoked, you may risk continuing to use a certificate which is about to be revoked.

  2. If your client application is run at a cron at a fixed time like UTC midnight, you risk being part of a thundering herd trying to renew when other similar clients do.

Many acme clients do have a scheduler in them, because often the best way for a client to run is inside an application that needs certificates. If that doesn’t describe your client, then of course that advice isn’t useful to you. However, your client is going to operate in a larger system, and perhaps the advice is more relevant to the users of your client.

Ultimately the algorithms suggested are simply suggestions, and you have to decide what makes sense for your usecase.

5 Likes

This post is very long (and feels a little AI generated?) but basically don't use ARI if you don't want to. Implement whichever scheduling for checks suits you, ignore the suggested window if you want to.

ARI is renewal window guidance and there are legitimate reasons to not use it if you don't want to.

2 Likes

Those two comments could use some further explanation

Let's Encrypt is now testing certs of 160 Hour duration to be generally available later this year. 90-day certs will still be available and be the default. See: Profiles - Let's Encrypt

By 2029 all Certificate Authorities (CA) will be reducing the max cert life to 47 days.

A cron-based ACME Client should be running at least twice/day even now. As noted earlier, the simplest approach is to check ARI and renew the cert if inside the ARI window or if the ARI window has passed.

One benefit of ARI is that you will be given a window in the past when the CA has revoked or is planning to revoke your certificate. Sometimes this happens due to faults by the CA.

ARI allows you to become less renewal timing aware. Otherwise you would check the remaining life on your previously acquired cert and renew it with only 1/3 of life left (LE's recommendation). Just checking cert life remaining doesn't help you renew before a CA revocation event though.

A good starting point before designing / developing your own ACME Clients is to review all the docs here: Documentation - Let's Encrypt

4 Likes

I am not just thinking about today. Coding is a lot of work, but I like having control and feel invested. I want to code with correct anticipation, or at least not foolishly. There is a time to let go. I struggle with that now.

Mike, that's great information. More than I can possibly assess any time soon. There are likely some useful indications of future intentions in there.

The way my client works, it is invoked manually to set up an initial collection of renewal file artifacts (mainly certificate key file and downloaded certificate chain file) in a directory within a certain directory subtree. A scheduler not by me (e.g. systemd, cron) invokes the client automatically (with a different entry point of renewal command) and the renewal process walks the directory tree and selects and attempts renewals by invoking what the manual installation invokes.

My ACME client can get the current time, but the client does not know the next time it will be invoked. To sleep the client for a certificate renewal attempt will delay the walk of the tree. I'm not saying I need a lot of certificates, but by design I want to simply walk the directory tree and finish execution until the next automatic invocation.

I wonder what fundamental designs/architectures others are using now and expect to be relevant (or inadequate) in the coming year or two.

Thanks for the thoughtful reply.

Not a sentence I ever expected to read in my life, but... surprisingly relatable.

2 Likes

When I started to support ARI in my client, I descoped ARI checks from renewals. A first task runs hourly, and just updates ARI that has expired. A second task runs every 4 hours, and will renew Certificates based on ARI or expiry info.

I split out the ARI checks, because they are very lightweight, and the execution time is consistent and predictable for a given number of certificates. Renewals can take time to process, vary based on the number of domains and challenge types, and might encounter errors. By splitting the two apart, any issues or slowness on renewals will not affect my ARI checks.

5 Likes

I suspect there will basically be two kinds of clients that use ARI data:

  1. Clients that manage many certificates, with their own sophisticated schedulers, which use ARI along with their own planning to figure out when to send renewal requests for what.
  2. Clients that just wake up occasionally, and renew if the cert is close to expiration or if the ARI suggestedWindow.start is before the current time.

And either one will work just fine for their respective use cases. The algorithm given in the ARI spec is really intended more for the first kind, where there's some persistent storage of "time to try based on current window" and where the "next wake up time" can be known easily.

4 Likes

I appreciate the direct answer, @jvanasco. I don't understand how or why you descoped ARI functionality and seemingly not. How complicated is your client architecture? Are you running a single process (and single thread)? What do you do if your random time selection within a given ARI window is in the future? I mostly don't think you are caching ARI renewal information for later. I understand ARI to be a second criterion by which to trigger certificate renewal.

Thank you for the assessment, @petercooperjr. I was tentatively supposing to use the simple second way you gave except that I would pick a uniformly random time within the window and if it is in the future I would not do it, unless my client criterion were to invoke renewal.

If the ARI window is in effect and whether or not my time within is future or not, if my client renewal criterion says to renew I was planning to renew with the ARI indicator in my JWS/message (I forget what it was exactly, but it's specified).

Your way is simpler: no random time selection in uniform way within the window, just renew. I wonder if anyone thinks that's too aggressive for the ACME server load? I figure that if my client runs daily, I would be late for an ARI window by no more than a day. Obviously, I am not taking a mission critical perspective here, but I can see that just renewing has its benefits. I could add a futuristic reach period to approximate the time until the next renew, but I don't want the issue of needing to synchronize a static configuration lookahead period to what is actually scheduled next, which need not be the same or nearly the same interval every time. If a client runs on Thursdays and Sundays at 2 am + 0-90 more random seconds, for example, then the interval would alternate between 3 days and 4 days. I don't particularly need to schedule at varied intervals, but I don't like the idea of needing to because my configuration requires it to most closely perform the suggested ARI algorithm.

I suspect that clients uniformly distributing their ARI renewal or total renewal traffic within a period given by CA servers via ARI is not likely without an enforced ubiquity of some approved, heavyweight client software. Heavyweight clients are not a direction I want to go. I wonder what OSs people here prefer? I use Linux. I am not smart enough to use BSD (groking software compilation options is too much for me).

tl;dr To anyone, do you have an opinion on whether or not always renewing if the ARI window is open and not yet closed (like always renewing if the ARI window has passed) a reasonable ARI client protocol?

I think anytime renewing in the ARI window is fine. That's what in theory the signal from the CA is supposed to mean, that it would prefer anytime within the window to a time outside of it. If you're planning on waking up multiple times a day, I suppose checking if now is later than the midpoint of the suggested window start and end (or later than a random point within the window, if you really want some randomness) might work at least as well as only using the start. If you're just waking up once or twice a day, it probably doesn't really matter one way or the other. Don't overcomplicate it.

4 Likes

Yes, already answered. Renewing when the suggestedWindow start time is in the past is perfectly reasonable for a simple client. Note this was comment from an LE staff member even :slight_smile:

Perhaps you have good reason but existing ACME Clients do the same as you are describing for your client. Something like lego or Certbot are commonly used.

Twice per day is better. There is a pending change to the LE docs about that even: https://github.com/letsencrypt/website/pull/1936
Oh, 15 minutes after I posted that it was " pending" it got pushed to the public site :slight_smile: See: Integration Guide - Let's Encrypt

Twice/day has been a common practice for some existing clients. Do stay away from especially congested times. See: FAQ - Let's Encrypt

Certbot has a cronjob example for randomizing times in this topic: User Guide — Certbot 5.0.0.dev0 documentation

If using systemd timers, you could just choose a couple random times when setting up the timer service. Or, of course, do any number of other methods.

3 Likes

Systemd even supports doing a random time for each invocation. I have my server set up to run lego at a random time between 7 hours and 9 hours from the last time it ran, leading to it checking ARI roughly 3 times a day but at random times.

# more /etc/systemd/system/renew-certs.timer
[Unit]
Description=Schedule for renew-certs service

[Timer]
OnActiveSec=1min
OnUnitInactiveSec=7hours
RandomizedDelaySec=2hour
AccuracySec=5sec

[Install]
WantedBy=timers.target
4 Likes

Thank you.

1 Like

Great points. Thanks for reminding me of the initial reply. I think I did not read it carefully enough. My approach is the wrong one, got it. :slight_smile:

I was wondering about the recommended renewal criteria for a certificate of very short duration, so glad to see for the first time the recommendation for certificates of a duration under 10 days.

This is what I currently have for systemd, though not currently active whilst I try to fix my client code.

[root@ur02 /var/www/certs]# cat certs.timer 
[Unit]
Description=Automated Certificate Renewal Timer

[Install]
WantedBy=timers.target

[Timer]
# A hardcoded value could be set with Unit=, but it is usually
# better practice to accept the default of the same basename
# appended with the '.service' extension.
#Unit=certs.service

RandomizedDelaySec=900
OnCalendar=Thu,Sun *-*-* 03:55 America/New_York
[root@ur02 /var/www/certs]#

For more frequent invocations, I like your approach. I may need it if I get ARI to work. I don't get the first and last directives of your Timer section. The first line seems like a superfluous timer in addition to the timer for the second line (I am probably interpreting it wrong). Also, the 5 second granularity seems extremely precise and wonder why so precise.

Hmm. Good questions, let's see if I can remember why I set things up this way. I think the first one for the initial one after the server was rebooted, or maybe after the timer was initially set up. The two hour random window still applies, so the first time it runs could be anytime in the first 2 hours that the system is up and the timer exists, and then the subsequent runs are every 7 hours after that plus the random time within the 2 hour delay. The 5 second granularity I think was because it was looking like it was always running at 0 seconds after a minute and I wanted to be an actually-random second as well, but it's highly likely that I missed something and there was a better way to do it and I just haven't looked at it since I got it set up until now.

It was my first (and only) time making a systemd timer definition, and done entirely through reading man pages and experimentation. It would not be surprising at all if there were a better way of accomplishing this sort of thing.

4 Likes

That makes sense to me. I'm also trying to grok the systemd.timer manpage. Empiricism is an important principle in the world of software. I can't identify a better way, so I think I'll use that if and when I get that far, now that I understand it.

The ARI payload contains a window for renewal and an expiry for the payload. IIRC, LetsEncrypt uses a 6 or 8 hour expiry. I download and cache each payload.

Every hour, my client runs a task to refresh the ARI data, downloading and caching the payloads for everything that has passed the cache expiry timestamp.

Every 4 hours, my client runs a task that just renews certificates that are now due for renewal, or will need to be renewed before the next task run. That timeliness is primarily based on the ARI window if available (choosing to renew at the start of the window), but falls back to 2/3 the certificate lifetime in hours.

3 Likes

Thanks for the clarification. Interesting approach.