I'm automating the process of creating certificates and I'm wondering if there is any advantage to keeping the client credentials stored, or basically creating a new client for each certificate renewal.
Are there any best practices or guidance for this?
Do you mean the ACME account that the client creates before ordering a certificate? The normal practice is to re-use the existing account but there may be clients out there that create a new account every time.
I'm wondering if there is any advantage to keeping the client credentials stored
IMHO, best practice is to persist credentials and keep a copy of the client's Account Key offline for emergency use. The issuing AccountKey is the fastest way to revoke Certificates on a compromise.
I once had a server compromised through a vulnerability in one of the services. It was an encryption scam, in which the attackers claim to encrypt the system and will restore for bitcoin. They deleted everything. If I had an ACME certificate at the time, I would have lost the AccountKey and would have had to go through additional steps to revoke the previous certs. That may not sound like much when you're not dealing with a breach, but when dealing with a breach that usually means doing the additional work on top of all the post-breach server migration, mitigation and security work. Even with a detailed plan in place, it is a high stress situation with a lot of steps - adding any work to that is something to be avoided.
We recommend persisting the ACME account key and reusing the same account across all requests (both renewals and additions of new domains, if the host manages multiple sites) from that host.
There are a few reasons for this recommendation:
As mentioned above, using the ACME account key is the easiest way to revoke a compromised certificate.
We have a rate limit on "new accounts per IP address". If you create new accounts too frequently on the same host, you'll eventually start getting 429 errors.
Many of our other rate limits are "per account", so if you always create a new account we can't properly enforce those. This might make it seem like you should always create new accounts to avoid running into rate limits, but in fact many of the rate limits are for your own protection. For example, we have a rate limit on identical certificates, to prevent a runaway client from successfully issuing new certs but then failing to persist them and immediately retrying. You don't want your client to be wasting those resources any more than we do, and you probably want to be made aware of that misconfiguration.
It's true that, especially if the client is only managing a single domain and only makes one request a month or so, you can get away with not persisting the account key. But we don't recommend that as a best practice.
Oh, one other thing: Your initial post implies that storing am ACME account key for the long term is a form of state, and that a stateless system might want to avoid doing that. True! But there are other forms of state within the ACME protocol.
For example, in order to issue a single certificate, the system has to 1) create a new order and keep track of its URL, 2) fetch each authorization and keep track of their challenge URLs, 3) fulfill each challenge using those URLs, and 4) finalize the order using its URL.
What happens if the system reboots in the middle of that process? You've lost the in-memory state necessary to complete it, and have to start over. That's generally fine, but a "stateless system" might want to avoid requiring that every request to the ACME server come from the same system component in sequence. If all of the ACME data is stored in an external persistent database, then any instance of the distributed system can pick up where another left off, if necessary.
Do I recommend building this sort of database-backed, stateless, distributed ACME client? Only if you really truly need it. But it is another point on the spectrum to keep in mind when thinking about a "stateless" client.
As ever, context matters, and in particular scale matters in this case. If you happened to be talking about managing thousands of certs across hundreds of service instances you have the right people in the chat!
Appreciate all the advice. Our use case involves running a glorified cron task every ~10 days to renew ~5 certs. I think keeping client keys is the right thing to do here in any case
That is poor practice. Your cron task should run daily if not multiple times per day. Your ACME Client should use ARI (ACME Renewal Info) to check if it has been revoked (or otherwise indicates it should be renewed). Some ARI background: Improving Resiliency and Reliability for Let’s Encrypt with ARI - Let's Encrypt
Let's Encrypt, or any CA, can revoke your cert if they discover problems with their issuance. These are rare but are important to account for. A key reason ARI was even developed is to deal with this problem. A 10 day cycle exposes your cert to a long period where it might be revoked.
If your ACME Client does not support ARI the recommended renewal time is when the cert has only 1/3 of its life left. For LE's 90 day certs that is with 30 days left. The industry cert max lifetime will drop to 45 days in 2029. LE itself will offer 6 day certs later this year (currently in limited rollout). A 1/3 plan is a good flexible algo.
Sorry if these were already known to you. But, once you said your renewal was each 10 days I thought it worth reviewing best practices. You might wish to review this as well: https://letsencrypt.org/docs/integration-guide/
The certbot renew should be run at least daily. Twice/day even better. A cronjob or systemd timer is usually setup during Certbot install. This page of the docs shows how to check that: User Guide — Certbot 4.1.0.dev0 documentation