Certbot inconsistent renewal behaviour leads to lockout - Improvement needed

I setup a website with a linux server last year.
Certbot cert install was successful.
In Jan 2020, auto renewal was not working but manual renewal #1 was OK.

Today 11 Apr 2020, I tried to do manual renewal #2
I experienced unexpected errors with directory and file behaviour inconsistent with last time, as in names and locations. So I was in problem-solving mode trying and re-trying again and it was hitting me with “FileExistsError” and “FileNotFoundError” even for the same file.

Then came the bad experience: “too many certificates already issued for exact set of domains:” and a link to the docu explaining “You’ll need to wait until the rate limit expires after a week.” The cert has only 3 days left. The website is a community service charity that is even more of a lifeline in the current coronavirus epidemic. Going insecure for 4 days (if that is accurate) is unacceptable in the age of https-everywhere. Therefore the only solution I can see is to buy in a commercial certificate.

I admit that I was slow to remember “lockout” and “dry-run” elements that I may have read about 6 months ago but needed better reminders for.

Here therefore is feedback for improvement so maybe others including my future self can have a better experience:

GIVE FAIR WARNING ON SCREEN
I recommend an on-screen warning message at 5 with a lockout limit at 8.
However if LE continues to insist on lockout at 5 then start warning messages at 3.

CERTBOT NEEDS TO REPORT FAILS BETTER
This was a fail: “FileExistsError” and “FileNotFoundError”. Therefore it should have been a 1 hour lockout rather than a 7 day lockout.

“EXPIRATION NOTICE” REMINDER EMAILS NEED TO GIVE FAIR WARNING OF THESE ISSUES
Especially the “5 strikes and you are out” rule and advice that if a renewal throws an error to start diagnosing with “dry-run”. Wording could also include: “Renewal may have issues on some servers therefore we recommend checking automatic renewal and if necessary running manual renewal at least 2 weeks before expiry”.

More details of my observed certbot behaviour:
I do not want to advertise details of a server with SSL troubles on a public forum. So I will call it xxxxx.yyy

On first obtaining certificates, they were stored in:
/etc/letsencrypt/archive/xxxxx.yyy

Then on renewal #1, they were stored in
/etc/letsencrypt/archive/xxxxx.yyy-0001
That added -0001 directory was strange. There were also “symlinks” in directory “/etc/letsencrypt/live/xxxxx.yyy” that I needed to edit following advice found on “StackOverflow” but I got it to work.

I think it was reasonable of me to expect renewal #2 to create a directory like “xxxxx.yyy-0002” but oh no, certbot returns to its first love of “xxxxx.yyy” and gets confused along with me. I think if I had more attempts (but the detective in me was too late finding out about dry-run) I could have tried renaming directories so that “xxxxx.yyy-0001” became “xxxxx.yyy” which would have given certbot the complete file set to work with.

For future letsencrypt activity I intend to avoid renew and always generate fresh new certs.

Or spend even a few minutes searching here, which would tell you that the rate limit applies to identical certificates. Adding or removing a name makes it different and avoids this rate limit.

Or use one of the five certs you've already created.

The certs are stored in /etc/letsencrypt/archive/, but you should never use those; you should use the symlinks in /etc/letsencrypt/live/.

That is not generally the expected behavior.

You should never have symlinks in /etc/letsencrypt/archive/. Those would only be expected in /etc/letsencrypt/live/.

No, it was not.

I think there's a basic misunderstanding here of how certbot stores the certificate files, and how they're intended to be used. The /etc/letsencrypt/archive/hostname/ directory contains a series of files: certn.pem, chainn.pem, fullchainn.pem, privkeyn.pem. In each case, n is a sequential number--it's 1 for the first cert issued, 2 for the second, and so forth. These are always normal files (never symlinks). In normal usage, you should never need to do anything with these files, and you would never be pointing any of your server config settings to any of these files.

The /etc/letsencrypt/live/hostname directory contains a set of symlinks, like this:

[root@neth-test neth-test.familybrown.org]# ll
total 4
lrwxrwxrwx 1 root root  49 Apr 11 11:39 cert.pem -> ../../archive/neth-test.familybrown.org/cert2.pem
lrwxrwxrwx 1 root root  50 Apr 11 11:39 chain.pem -> ../../archive/neth-test.familybrown.org/chain2.pem
lrwxrwxrwx 1 root root  54 Apr 11 11:39 fullchain.pem -> ../../archive/neth-test.familybrown.org/fullchain2.pem
lrwxrwxrwx 1 root root  52 Apr 11 11:39 privkey.pem -> ../../archive/neth-test.familybrown.org/privkey2.pem
-rw-r--r-- 1 root root 692 Apr  2 14:38 README
[root@neth-test neth-test.familybrown.org]# 

The names of these symlinks remain constant, so you can use them in your server config files, never needing to update them. Unless you have a renew-hook or some other process set to copy these somewhere else on issuance/renewal, you should be pointing your server config settings to the files in /etc/letsencrypt/live/.

The better course of action would be to figure out why certbot is misbehaving and correct it--it's highly likely there's something wrong with your configuration that's causing the behavior you're seeing.

for one hour. not like out, out.

...or one week, in the case of five issued certs.

if you get rate limited for one week, you actually have had five certificates successfully issued.

...which is what OP said happened:

And apparently OP is doing manual renewal and waiting way too long to do it (since the default automatic renewal would have happened 3+ weeks ago), so there's all kinds of fail going on here.

@9peppe writes

if you get rate limited for one week, you actually have had 5 certificates successfully issued

Running sudo certbot certonly results in only one new file created:
/etc/letsencrypt/archive/xxxxx.yyy/privkey3.pem
Which appears in there after certbot displays this error message to the contrary:
FileNotFoundError: [Errno 2] No such file or directory: '/etc/letsencrypt/archive/xxxxx.yyy/privkey3.pem'
In the previous 2 runs, certbot created 4 files, therefore as Yoda would say: "5 certificates successfully issued this is not".

@danb35 writes:

Yes. In this situation I am at risk of lockout and reading this forum I see that I am not alone. Therefore to carry out the kind of testing and correction suggested we need fair warning of this risk including reminders of the existence of a test system and advice to switch to it promptly at the first renewal problem. This is the main point of this post as a "feature request" asking for on screen and in-email warnings.

Your certbot configuration is messed up. It should not behave that way.

Did you mess with /etc/letsencrypt or the files therein? Did you run certbot as different users?

If Yoda would say that, he would be wrong. The certificate has been issued. The fact that your client configuration is messed up such that it isn't saved properly doesn't change that fact. But you've shared an important piece of information here, and that's that you have the private key. With that, you can download the corresponding cert from crt.sh, use the chain file you already have, and have a valid and usable cert while you figure out what you've broken (because something in /etc/letsencrypt/ is badly broken). Here's what I'd suggest:

  • Search https://crt.sh for your domain, and download the cert matching the new privkey3.pem file.
  • Save that cert, privkey3.pem, and any of the chainn.pem files, somewhere outside of /etc/letsencrypt/.
  • Adjust your web (and other relevant) server configuration to use those files.

At this point, you have a working cert that's good for another 90 days. Then, once a week has passed:

  • Make a backup of the /etc/letsencrypt/ directory, then destroy it. Nuke it from orbit, it's the only way to be sure.
  • Issue a new cert covering all the domain names you need.
  • Edit your web (and other relevant) server configuration to point to the new files in /etc/letsencrypt/live/yourdomain/
  • Set up a cron or systemd task to run certbot renew daily.

This should be all you need to do to. To test, a week or two later, run certbot renew --force-renewal and confirm that the new cert is issued, it's also stored in /etc/letsencrypt/archive/yourdomain/ with a "2" number, and the symlinks in /etc/letsencrypt/live/yourdomain/ are updated to point to the new files.

1 Like

Thank you for this advice. I am reading it after buying and successfully installing a commercial certificate. I have however learned a lot from this conversation that will help me work better with other systems.

Your post demonstrates classic PEBCAK.

Nowhere in your initial post do you state what you actually did, even though the post template asks you to describe what you did. Instead you just complained about the outcomes you got. From your description, it's pretty obvious you weren't actually renewing anything as you should have (via "certbot renew"), and you were instead creating new certificates.

Instead of asking for help on what you did wrong, you post an essay about all the ways certbot and Let's Encrypt are bad. Sure "the" certificate has only three days left before expiry, but what about the unknown multitude of other certificates you created? (You know, the certificates stored in directories ending in "-0001", "-0002", etc.)

You then complain about how when you successfully renewed your first certificate, the renewed cert was stored in it's own original directory!

You were doing that anyway, which is why you had so many problems. Of course, I think you know that, which is why you never posted what you actually did to create this mess.

Auto-renewal is not rocket science. Anybody who claims they're capable of setting up a website is surely capable of adding "certbot renew" to a weekly or daily cron job. It's that simple.

You know, a little humility goes a long way. You made mistakes, and you could have easily fixed those mistakes if you asked what you did wrong and what you should have done to get it right. Instead you chose to sound like Donald Trump at a press meeting, proudly demonstrating your ignorance while blaming everyone else around you.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.