There has been several complaints today that the latest Chrome can’t connect some sites using Let’s Encrypt certificates. I narrowed it down to the OCSP – all sites in question get an expired response from OCSP serve (for example kvlt.ee):
$ openssl ocsp -header "HOST" ocsp.int-x3.letsencrypt.org -issuer kvlt.chain.ee -cert kvlt.ee.pem -text -url http://ocsp.int-x3.letsencrypt.org/
...
Response Verify Failure
139881862981264:error:27069076:OCSP routines:OCSP_basic_verify:signer certificate not found:ocsp_vfy.c:92:
kvlt.ee.pem: WARNING: Status times invalid.
139881862981264:error:2707307D:OCSP routines:OCSP_check_validity:status expired:ocsp_cl.c:370:
good
This Update: Dec 5 04:00:00 2016 GMT
Next Update: Dec 12 04:00:00 2016 GMT
For any other browsers (Firefox, Safari) it’s not fatal.
Switching off OCSP stapling fixes the problem for Chrome as well.
Clearly there is a problem with Let’s Encrypt OCSP responses, but why is it fatal for Chrome only? Is it the problem in Chrome? Or in Apache? Or in other browsers?
One of our application started to warn users about certificate revocation status. I captured the network traffic with Wireshark and found that the OCSP response was like:
The nextUpdate value matches the time that the problem appeared. I double-checked the OCSP status of our site with https://www.pkicloud.com/tools.html and the result was the same.
This is likely an issue with OCSP signing being delayed or with the CDN that Let’s Encrypt uses serving stale OCSP responses for some reason. Both of these things are unfortunately out of your direct control.
I imagine once the Operations team becomes aware of this issue, there’ll be an update on https://letsencrypt.status.io/, so you could sign up there to follow any progress.
Unfortunately https://letsencrypt.status.io/ shows all servers green
In the mean time our applications go down because of this. Are we sure that the operations team is aware of this ?
Judging from the above though, the responses are simply not updating, OCSP stapling can’t help there.
What’s not clear yet is whether this was a CDN fault or something broke at Let’s Encrypt and no new OCSP answers were being signed. But either way it’s concerning to have nobody actually on top of the incident for seemingly 3+ hours AND that there wasn’t anything in place to detect the looming catastrophe. Presumably these OCSP answers were antique, though not yet expired, on Saturday, and the problem could have been found and fixed then.
Once upon a time Let’s Encrypt published statistics showing OCSP signing. Those went away. I presumed they had simply gone from public visibility but perhaps instead Let’s Encrypt ceased even to monitor its own systems in this regard and thus got blind-sided. This is especially important because it takes time to sign OCSP responses, so if the process to sign them broke, or the signed ones are lost and must be recreated, that’s going to take many hours.
We’re facing the same problem, multiple customers are reporting outages.
All are (as far as we’ve been able to ascertain) OSCP errors. A quick SSLLabs check shows an OSCP error
“Revocation information OCSP
OCSP: http://ocsp.int-x3.letsencrypt.org/
Revocation status Good (not revoked)
OCSP ERROR: OCSP response expired on Mon Dec 12 01:00:00 PST 2016”
We have disabled OCSP stabling. Google Chrome by default ignores OCSP problems, so the majority of the visitors won’t notice the error.
Firefox is a decent browser it cares about OCSP.
We have disabled OCSP stabling. Google Chrome by default ignores OCSP problems, so the majority of the visitors won't notice the error.
Firefox is a decent browser it cares about OCSP.
How did you disable OCSP stapling? Live, on living certs?