I have a little over 6,000 certificates on a single server using the webroot feature on certbot. I know the suggestion is to automate renewals. However, renewing that many certificates takes a full day or more. It took 4+ days just to get that many certs issued. With such a long running process, it makes me nervous to automate.
More info here:
I tried making a pull request to update certbot so that I could do the renewals in batches and automate the process. However, that pull request was closed without merging in the changes:
Does anyone have any other ideas for better ways to handle such a large number of renewals?
I am sure you will get much better and more comprehensive replies than this.
But, you could make your own limiter like you proposed on the github using the --cert-name X option with the renew command.
All your cert profile names can be derived from the file names in /etc/letsencrypt/renewal. Or by running sudo certbot certificates
Then, create you own batches using
sudo certbot renew --cert-name X1
sudo certbot renew --cert-name X2
and so on
Another way is to stagger your renewals by adjusting each ones expiration days value in its renewal profile. That starts to get a little funky and somewhat non-standard. Still, it maybe is a quick and dirty.
Certbot is not the client for you. It does not perform well after 100 certificates; I am surprised you made it it his far.
I am in the process of a major update to my client, which is primarily designed for stuff like this, but I'm at least a week away from a stable release. It was supposed to be done by now, but I'm in the middle of redesigning some things after realizing ISRG's ARI implementation will be incompatible for short-lived certs on large installations. The ARI window for forthcoming short-lived certs will be between 2-4 hours, so a 10s issuance cycle can only handle 720-1440 renewals in that period. To deal with that, I need to better track/monitor expected renewals so multiple task-runners can be launched.
The underlying datastore is sqlite3/postgresql, but current certs can be maintained on disk.
It offers a web frontend, because it was designed to be managed by a JSON API - but most things can be done on the commandline and I plan to support everything on it.
We've been using a version of this for ~100 certs for a few years; but it was designed to scale to thousands.
There are other clients focused in this space as well.
If one wants to stay in Certbot... though I don't think they should...
1- Partition Certbot into multiple configuration directories. Try to keep something like 500 certificates on each installation. Partition certs on a hashed value of the domain name to get even distribution. You can use the --config-dir to run multiple installations with one binary.
2- Use the webroot or standalone challenge mode. each installation has a different webroot or http01 port that is proxied onto. use the hashed domain name for partitioning from the port 80. you can calculate an md5 for rewrite/proxy in Nginx if you're running openresty; apache can do it with a few of the different mods. I am not sure if any vanilla webservers can, but IIRC some of the gateway servers can.
3- Each certbot can have it's own cron entry, so they can run operations in parallel.
Partitioning should make Certbot significantly faster to run (a big problem is that it parses the directory as it's database on each invocation) and will enable parallel runs (the installations are isolated, so they won't step all over each other).
Partitioning on md5 gives a fairly decent distribution. If you wanted to stay on a vanilla system, you could probably write a script to partition the domains based on the first 2 letters into equal-ish buckets and generate the nginx/apache configs to handle that traffic. I would rather just do the md5.