The conclusion in thread Delete Expired Certificates was to leave /etc/letsencrypt untouched but this doesn't match my requirements, so I restart this topic:
after 3 years of running a proxy gateway with SSL termination I see a /etc/letsencrypt directory with
~1.4 GB in size
~330.000 files
currently active ~2900 certificates
Most of the files are expired keys or certificates of active domains (so I cannot cleanup by the removal of old domain names).
As /etc/letsencrypt is synchronized to hot standby servers as part of a interactive process I try to reduce the size of the directory to speedup the sync duration.
I tend to develop a cleanup-script that recurses /etc/letsencrypt/archive/{domain}, check every cert{ID}.pem for certificate expiry and - if true - remove cert|chain|fullchain|privkey{ID}.*.
Is this a valid approach?
Is there an existing cleanup-script available for this task?
Besides that I was reading the removal of
/etc/letsencrypt/csr/*
/etc/letsencrypt/keys/*
can be done without interference to the certbot process. Is this correct?
You might investigate what you can do with "find DIRNAME -f -ctime ...| xargs rm " properly applied to the correct directory. Archived certs other than the current certs are not really useful, at least so far as I know. Just cleaning up all but the most current ones should suffice. I doubt you need anything as complicated as checking certificate expiry. I would be astonished if you needed the many thousands of files you have. If "-ctime" is not a find option on your system, look at "-mtime".
At any rate, "man find".
I do counsel being careful, using some backup method, and lots of dry runs.
If you really want to go through the effort, you can remove all but the most recent set of files, rename the remaining set of files to "1", then reestablish all the symbolic links in live to point to the "new 1's". I think you can use the update_symlinks function of certbot to reestablish.
Over the years there have been several incarnations of "find", with various options especially revolving around the "-ctime" and "-mtime" switches. BSD behaves one way, System V a little differently, and no telling what exists for the OP. I would second the notion that you should check out "man find" before you do anything else.
You may also find "man xargs" useful in your environment since the "-exec" switch to "find" sometimes can lead to overflows (if you have a lot of files) which piping to "xargs" can solve. ie
"find ... -print | xargs rm"
And ... can "read the manual first" be somehow appended automatically to all posts ? In the "good old days" we used to say "RTFM".
I believe the dirs keys and csrs are not relevant for production purposes: the private keys are also stored in archive and the CSRs are used just once when issuing a certificate. I believe the Certbot team is thinking about removing them altogether if I remember correctly. There might be some discussion about that on the Github repo.
Do you think a renaming to "1" really is required?
I would have expected that certbot continues with n+1 if it finds a live configuration pointing to version n. The existence of file versions < n should be irrelevant to certbot in my understanding. Removing only old versions would not require to run 'certbot update_symlinks'.
That's a good question. The method Certbot uses to determine the next number in sequence is not documented. Some experimentation needed to see how its done and hope no changes. Or, review its code in github.
I can imagine two other methods than the one you describe:
Use first available number from beginning of /archive/ sequence
Add 1 to largest existing number in /archive/
Normally of course /live/ points to the largest used number but you could manually change the symlinks to a previous one for various reasons. I don't know if you have ever done this or if this would mess up Certbot. I did it once when I created a staging cert and pointed the symlink to the prior production cert. I don't remember what happened on that test system later.
My guess is you could remove any older /archive/ files but I would retain anything created in the last 90 days as they are still usable (in odd cases).
BUT, testing carefully is required.
EDIT:
In this thread _az, a Certbot dev, said csr and keys could be deleted. The link in a later post in that thread addresses the purge cycle you ask about.
Also @schoen was asking for input on this from an insider group earlier this year. Maybe he can add clarity.
If I read https://github.com/certbot/certbot/blob/master/certbot/certbot/_internal/storage.py correctly, Certbot extracts the current numbers from the filenames using a regex and will use the highest number to find the next one. So missing numbers shouldn't be an issue. That said, I don't know if the team would consider changes to this logic a "breaking change".
That was purely to provide an everlasting (and idempotent) solution. You could simply purge 1...n-1 and be fine. There are very, very few reasons (e.g. concern for key compromise combined with inability to timely reissue) for keeping old, duplicate certificates regardless of whether they are expired. Frankly, barring a well-justified reason, I advocate deleting all old, duplicate certificates every time a new, duplicate certificate is acquired. That has always been the behavior of CertSage, the ACME client that I authored and maintain. If certbot were to follow this pattern, there would be no need for the symbolic links at all and anything using the certificates certbot acquires could simply reference the actual files since they would no longer have changing numbers appended that cause issues for static reference. I believe that you could simulate this behavior by using a certbot deployment hook that does exactly what I've described and modifying your webserver (or whatever) configuration to point to the "1" set in the archive directories.
I now think a slightly more cautious version of this behavior would have been more appropriate in Certbot.
As I've written before, when we first wrote Certbot, we imagined at least some people using it in a considerably more manual and hands-on way than they actually do, where they would commonly examine certificates to decide if they were satisfied with them, or something. Certbot's logic about saving everything indefinitely partly predates the design, or at least the completion, of the installer code, which normally deploys all new certificates without having anyone inspect them in any way (based on the idea that, if the certificate issuance succeeded, the certificate contents will be the same as what is expected, except in very unusual cases such as the dangerous option --allow-subset-of-names).
If I were doing it over again, I would have automated renewals save one prior version of everything, and manual renewals save two prior versions. And I would probably not save keys or csr outside of the version management at all.