This was inspired by a recent discussion in an issue on Certbot's GitHub page.
I was there when the Certbot team decided to make Certbot keep extensive history related to prior certificate versions and issuance history (which it does in several ways), and I remember a lot of the motivation behind it. I helped design the mechanism that Certbot uses to track these versions.
My recollection of the motivation for keeping old certificates, keys, etc.
When we first started working on Certbot, we thought sysadmins would often not want to use what we now call an installer plugin, and would often want to configure their sites partially manually. We also thought they would commonly want to manually inspect newly-issued certificates before starting to use them.
All of these intuitions derived from our understanding of prior practice with previous certificate authorities, and also some feedback about preferences of sysadmins who preferred to take a more hands-on approach. (Indeed, a small minority of users have continued to vocally complain about how much Certbot attempts to automate for them, up to the present day.)
For some reason, we thought it was possible that sysadmins inspecting their new certificates would decide that the new certificates were not correct or not what was intended, and would then want to delay "deployment" of the new certificates, or even roll back a deployment to a previous version.
(In fact, we even originally expected a two-phase "obtain certificate" and "deploy certificate" process, where what we now think of as authenticators and installers might be used separately in separate invocations of certbot
! And, with automated renewal flows, their timing might be separated by a significant amount of time—measured in multiple days. The new certificate would then be present on the user's disk for the entire period between when it was obtained and when it was deployed, remaining deliberately unused during that interval.)
In light of how few elements of the certificate Let's Encrypt actually allows users to control, and how reliably the system as a whole has worked, this now seems like a vanishingly rare situation, and the only case in which it seems to occur in practice is when people accidentally remove domain name coverage that they didn't mean to. But that has been mitigated a bit in other ways and may still be mitigated in additional ways in the future.
We literally thought at the outset that there might be a common use case for people to say "I don't actually like version 7 of my certificate; let's roll back to version 5". But, roughly speaking, nobody ever asks how to do this.
Some problems caused by the current system
The current system (which, again, I helped design and bear quite a bit of responsibility for) uses a fair amount of disk space. It also keeps old private keys around indefinitely, which is a rapidly decreasing security threat because of the huge rise in PFS ciphersuites, but which does make individual Certbot installations a target for someone who wants to compromise historical TLS traffic that was encrypted with a non-PFS ciphersuite. I don't know what percentage of sessions today end up negotiating such a ciphersuite.
The biggest challenges with the current versioning mechanism, though, are
-
It's kind of brittle with regard to referential integrity. Many users don't understand that they shouldn't rename any of the files under
/etc/letsencrypt
(even though there is aREADME
file warning them not to), and the result of renaming these files is often that Certbot refuses to run at all, plus a family of bugs (much less common nowadays) where Certbot attempts a renewal every time it's run because it doesn't save renewed certificates in the place it's expecting to find them afterward. -
Users often don't understand what it's for.
-
People seem to have a hard time making working backups using symlinks, because they often use backup methods that don't preserve them.
-
The symlinks have also been a problem to some extent for the Windows port, where people are even less familiar with symlinks.
A possible alternative mechanism
Maybe there could be a new directory called /etc/letsencrypt/old
or /etc/letsencrypt/backups
which contains (only) the three most recent versions of each privkey.pem
, chain.pem
, fullchain.pem
, and cert.pem
for each certificate lineage, not as symlinks but as regular files, kind of on the model of logrotate
keeping backups of recent old log files in /var/log
. For example, there might be
/etc/letsencrypt/old/example.com/privkey.pem.1
/etc/letsencrypt/old/example.com/chain.pem.1
/etc/letsencrypt/old/example.com/fullchain.pem.1
/etc/letsencrypt/old/example.com/cert.pem.1
and also .2
and .3
, but no more. The corresponding /etc/letsencrypt/live/example.com/privkey.pem
and so on would still exist at their existing names and locations (especially to make existing web server configurations and documentation continue to be correct), but would now be regular files instead of symlinks into ../../archive/
. The /etc/letsencrypt/archive
directory would be deprecated and would contain a README
file stating that it is no longer used, and that older versions of certificates could be found in /etc/letsencrypt/old
(or /etc/letsencrypt/backups
).
There would still be a referential integrity issue about what happens if someone edited or renamed the renewal configuration file for a lineage without also changing the corresponding live
directory name, but there would no longer be any issues at all about broken symlinks or symlinks pointing to the wrong archive directory.
The storage.py
logic would become significantly simpler overall, although there is still a question about atomicity and consistency of updates during a renewal.
Cc @certbot-devs. (I'm not trying to saddle you with work that's not part of your roadmap or anything; I might also make an experimental PR to demonstrate this approach if anyone is interested.)