Looking into this. Thanks
Ah, well, that won’t help always, especially if you enabled OCSP Must-Staple.
If you staple (and I agree that stapling is a good idea in principle), you need tools to monitor the stapled OCSP responses in order to have peace of mind about the system. You might think of this as like the fuel for the emergency diesel generator at a data centre, you should make sure you know how much fuel there is and how long it will last, in order to be able to order more fuel, or know you won’t get fuel in time, and plan for what happens then, rather than the lights all go out and you sit in darkness wishing you’d known this would happen.
If there had been comments here on Sunday saying “Why are my OCSP responses only 24 hours from expiring? Isn’t that cutting it fine?” then it might have raised a flag in time to avoid any actual outages. That is now impossible. Some responsibility must lay with Let’s Encrypt (even if this was a CDN fault) but we also need to protect ourselves.
Try a simpler client: https://github.com/veeti/manuale
*Excuse me. The must-staple extension is in already your certificate.
I can confirm that this fixes the issue. The OCSP server serves responses for the new cert that expire in the future, as they should.
How do you do that?
The following certs are not due for renewal yet:
No renewals were attempted.
I use the Dehydrated client, where you can set RENEW_DAYS in the config file.
With certbot, the official client, have a look at your config file. It looks like you can set something like
renew_before_expiry = 1 year to always renew your certs.
Is there a RATE LIMIT on how many CERT I may RENEW from one single IP? (I know there is one for NEW REG)
As we would have to regenerate some thousand CERTs for this … :-/
No. The usual rate limits on certificates per domain apply, though.
Thanks, issue solved for me.
letsencrypt renew --force-renewal
The problem IMHO is that the current situation should not be fatal unless you have an EV certificate or you have “Must-staple” set. Neither of this is true in my case and for most of users (I assume). And because of that any OCSP monitoring etc shouldn’t be necessary.
So, I suspect the real problem is a bug most likely in Chrome, because it’s the only browser affected and it isn’t problem either if OCSP stapling isn’t used. Or maybe the the bug in Apache. Can someone more familiar with OCSP stapling related standards confirm?
Choosing to interpret an expired OCSP response as a failure seems eminently reasonable.
If a browser treats a stapled and expired OCSP response as OK, then effectively that browser has no OCSP validation, it almost might as well never look at the OCSP responses at all, since it’s so easy for bad guys to record a valid OCSP response, then play it back forever and fool that browser long after the certificate is marked BAD in OCSP.
Of course it makes no sense for Apache to send expired, or BAD, or otherwise broken OCSP responses over to the client since they can never help, but this is a known shortcoming of the current Apache OCSP implementation and worth complaining about if you’re a server operator using Apache.
No browser interprets an expired OCSP response as a failure, seems. The whole problem with this situation is that it’s only Chrome and only a stapled OCSP response.
AFAIK whole OCSP situation is a softfail at the moment which makes sense due to unreliability of OCSP (illustrated perfectly at the moment), OCSP can be blocked, dDOSed etc.
It’s not limited to chrome, we have Java client/server apps with the clients making Rest calls over HTTPS.
Recents JDK do check OCSP validity periods making the applications unusable for the time being.
A workaround is to disable OCSP checking in the JDK preferences but that’s not easy to explain to a customer…
Monitoring OCSP status is lovely, but only tells you that it is not working, not that the problem is at Let’s Encrypt.
I ended up at this thread because my monitors started complaining.
Thanks for reporting. We’re looking into the problem and have found a performance problem is causing our OCSP updates to fall behind: https://letsencrypt.status.io/pages/incident/55957a99e800baa4470002da/584eb241ac0b4a4f0e006291. We’re working to fix it now.
We do have monitoring in place for our OCSP service, but it looks like there is a bug in the monitor that checks only for a successful OCSP response, not expiration.
Sorry for the outage.
Assuming your monitors are complaining e.g. “Warning: OCSP expires in 18 hours” then you were in time to make sure Let’s Encrypt knew they had a problem and were working on a fix. Whereas if your monitors only say “OCSP expired” I would suggest that you need better monitors, because “Aargh, it’s on fire” is better than nothing, but not very much better.
To @jsha I must say, and I admit this is with hindsight, but still, that it makes sense to measure the OCSP system for time-to-drop-dead (in how many hours will the oldest of our current OCSP responses expire?) which tells you how long you have to decide what to do instead if anything goes wrong, and pace (over the course of the 7 day validity period of OCSP certificates, how many times could your infrastructure sign all the certificates at its current rate?) which tells you whether you have enough capacity to continue operations, or need to either shrink operations or buy more capacity.
Totally agreed. It was our intent to measure this, and our monitoring unfortunately fell short. We’ll definitely be making some improvements here.
Jacob, thanks for the update.
As a quick, temporary fix for people in this thread: This issue only affects updating of old OCSP responses, not generation of new ones. So if you reissue your certificate you’ll get a fresh response you can staple.
We’re still working as fast as we can to get OCSP signing caught up, but it may take a little while.