OCSP server sending expired responses + stapling breaks Chrome

@pfg Is the operations team aware of the issue? The status page still lists the OCSP servers as green.

Is OCSP stapling supposed to help when we are already suffering from the incident?

I don’t think so :frowning:

Facing the same Problems with over 1.000 Sites ... that's serious ... any Updates on this???

[Mon Dec 12 14:59:08.179312 2016] [ssl:error] [pid 23980] AH01936: stapling_check_response: response times invalid
[Mon Dec 12 14:59:08.179433 2016] [ssl:error] [pid 23980] AH01943: stapling_renew_response: error in retrieved response!

Andreas Schnederle-Wagner

I’ve requested new certificates for all important domains and those seem to work for now… Not sure how feasible that is for everyone.

I wasn’t aware that OCSP was this fragile. I’ve been trying to pitch LetsEncrypt as a reliable alternative to my boss, but this is not helping :slight_smile:

Workaround

We have disabled OCSP stabling. Google Chrome by default ignores OCSP problems, so the majority of the visitors won’t notice the error.
Firefox is a decent browser it cares about OCSP.

Take it easy, all BIG SSL providers have OCSP outages. I do not mention names.

Start monitoring your SSL provider now: https://github.com/szepeviktor/debian-server-tools/blob/master/monitoring/ocsp-check.sh

We have disabled OCSP stabling. Google Chrome by default ignores OCSP problems, so the majority of the visitors won't notice the error.
Firefox is a decent browser it cares about OCSP.

How did you disable OCSP stapling? Live, on living certs?

I've disabled it in our webserver.
https://httpd.apache.org/docs/2.4/ssl/ssl_howto.html#ocspstapling

This forces the HTTP clients to do the OCSP checking.

Looking into this. Thanks

Ah, well, that won’t help always, especially if you enabled OCSP Must-Staple.

Dehydrated has a nice option for this that just broke my neck. :confused:

If you staple (and I agree that stapling is a good idea in principle), you need tools to monitor the stapled OCSP responses in order to have peace of mind about the system. You might think of this as like the fuel for the emergency diesel generator at a data centre, you should make sure you know how much fuel there is and how long it will last, in order to be able to order more fuel, or know you won’t get fuel in time, and plan for what happens then, rather than the lights all go out and you sit in darkness wishing you’d known this would happen.

If there had been comments here on Sunday saying “Why are my OCSP responses only 24 hours from expiring? Isn’t that cutting it fine?” then it might have raised a flag in time to avoid any actual outages. That is now impossible. Some responsibility must lay with Let’s Encrypt (even if this was a CDN fault) but we also need to protect ourselves.

Try a simpler client: GitHub - veeti/manuale: A fully manual Let's Encrypt/ACME client

*Excuse me. The must-staple extension is in already your certificate.

I can confirm that this fixes the issue. The OCSP server serves responses for the new cert that expire in the future, as they should.

How do you do that?

letsencrypt renew


Processing /etc/letsencrypt/renewal/****.com.conf

The following certs are not due for renewal yet:
/etc/letsencrypt/live/****.com/fullchain.pem (skipped)
No renewals were attempted.

I use the Dehydrated client, where you can set RENEW_DAYS in the config file.

With certbot, the official client, have a look at your config file. It looks like you can set something like renew_before_expiry = 1 year to always renew your certs.

Is there a RATE LIMIT on how many CERT I may RENEW from one single IP? (I know there is one for NEW REG)
As we would have to regenerate some thousand CERTs for this ... :-/

No. The usual rate limits on certificates per domain apply, though.

Thanks, issue solved for me.

letsencrypt renew --force-renewal

1 Like

The problem IMHO is that the current situation should not be fatal unless you have an EV certificate or you have “Must-staple” set. Neither of this is true in my case and for most of users (I assume). And because of that any OCSP monitoring etc shouldn’t be necessary.

So, I suspect the real problem is a bug most likely in Chrome, because it’s the only browser affected and it isn’t problem either if OCSP stapling isn’t used. Or maybe the the bug in Apache. Can someone more familiar with OCSP stapling related standards confirm?