I have stumbled on an error/certbot bug that has been reported as far as 2019 but hasn't been fixed. I managed to termporarily fix my setup but it's not a permanent fix, it feels certbot needs a bit of debugging.
The issue:
when I run certbot renew (or when it is run via cron) it started failing with
Renewal configuration file /etc/letsencrypt/renewal/www.example.com.conf is broken. The error was: renewal config file {} is missing a required file reference Skipping.
Note: I have changed my domain name to www.example.com in the above output.
What I tried:
I had a look at posts from 2019 about this error and tried steps from those with no success.
I can run certbot run and this works. However I noted that in /etc/letsencrypt/renew/ all .config files apart from the last one for the domain are empty. I wonder if this is what is causing the issue.
I tried deleting all folders and files (live, renew, archive) and then doing certbot run so to get new certificates and config. This appears to work (ie no errors reported), however, when I run certbot renew after this it still throws the same error about parsing empty config file.
I then had few iterations of deleting /renewal folder contents and running certbot -d www.example.com (for three different hosts) and this produced patchy results. Sometimes only one .conf file would be saved in renewal but sometimes there would be 2-3 versions of which first two would be empty and the last one would have the config data (cert paths, account info, etc).
Whilst the step (4) appears to have worked for now, this is worrying as I worry this certbot bug will reappear.
Does anyone know why is this bug (still) happening and can we have a fix please - what is the permanent solution?
Environment:
certbot 1.21.0
Server is AWS Nano instance, Ubuntu Linux 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP
Server version: Apache/2.4.52 (Ubuntu)
Server built: 2023-10-26T13:44:44
I have 200Mb+ free memory at all times, and 4Gb+ disk space (I know it's not a lot but still certbot doesn't strike me as a resource intensive script (?))
I use the certbot version that comes with the apt ubuntu 'mainstream' channel for simplicity and ease of updates. I will try manually upgrading to 2.9 and see if this improves things. The problem is that I got the certbot to work now on my set-up so I don't believe I will be able to reproduce the error. I wonder why are versions 2+ not distributed with ubuntu official apt channel - hopefully they won't break something else. I will report back the results.
I'm not sure if it's an actual bug. Most of these problems with broken renewal configuration files comes from users manually tampering with the directory structure or the files in /etc/letsencrypt/, leaving Certbot broken.
It's a rather big Python application with many libraries, so it could end up using more memory than you think. If I simply run a random Certbot certonly and pause it half way through, it uses 139 MB of "non-swapped physical memory" (the RES column of htop).
Interesting about the memory usage - perhaps this was the culprit.
Still, if the script runs out of memory surely it should fail gracefully with an error message to that effect. More importantly, if the script fails for whatever reason it should not corrupt the .config file (by the looks of it the .config is left empty which messes up subsequent runs). Alternatively, there should be a 'recovery' or 'restore' command to fix corrupted install.
In my case, I didn't touch the folders or files.
My guess would be that most people have other stuff to do rather than tinker with config files from what is supposed to be an automated (and 'black-box') solution. I certainly wish I had the time to tinker with random packages
For resource constraint systems it's often better to use an ACME client which uses a lot less resources. I'm not a fan of the client (it defaults to ZeroSSL for example), but acme.sh could be better suited. Or any other Bash ACME client. See https://letsencrypt.org/docs/client-options/ for a non-exhaustive list.