OCSP server sending expired responses + stapling breaks Chrome


#39

Jacob, thanks for the update.


#40

As a quick, temporary fix for people in this thread: This issue only affects updating of old OCSP responses, not generation of new ones. So if you reissue your certificate you’ll get a fresh response you can staple.

We’re still working as fast as we can to get OCSP signing caught up, but it may take a little while.


#41

Well, this sucks for Must Staple certificates, even with the workaround of issuing new certs :disappointed:


#42

Also confirmed this workaround. Thanks.


#43

I’m now seeing SEC_ERROR_OCSP_OLD_RESPONSE error in Firefox 50.0.2 for my website. This error cannot be dismissed without going into about:config, which makes the website essentially unreachable.

I checked again and my web server is not currently sending any stapled OCSP responses. I’m not using Must-Staple.


#44

@avian2: What’s your site?

Update for everyone: We’re very close to catching up an all expired OCSP. There’s about 47 minutes of catchup left to be done. Again, my apologies for the outage.


#45

Thanks a lot for the work…


#46

We’ve brought all the expired OCSP responses up to date, and this problem should be fixed. If it’s not showing as fixed for you, you may need to get your web server to pull an updated OCSP response for stapling. If that still doesn’t work, please let me know and we’ll investigate further.

We’ll be posting an incident report soon detailing the problem, our fixes, and our plans to avoid similar problems in the future.


#47

Very likely you’ve thought of all this, but two immediate thoughts pop into my head, as someone who isn’t using Let’s Encrypt on anything critical but has had “get woken in the middle of the night” level responsibility for systems using certificates from somebody else for more than a decade.

  1. One way or another there needs to be a way to send a “bat signal” to technical operations people like @cpu when something is clearly wrong. And whilst such a “bat signal” is worse than useless if it can be abused trivially, it’s also no good if only Gordon can use it and he’s the one the Joker just kidnapped. It looks (hopefully the incident report will make clear) as if there was a period between when @pfg realised there might be a serious problem and when @cpu began actually investigating.

  2. Some, perhaps even many, of the vital health signs for Let’s Encrypt can be made public without incurring any risk, especially if they’re deliberately kept coarse enough that they can’t be used to infer anything detailed about the systems’ internal state. Public statistics get you 24/7 eyeballs, they can help people to confirm if a problem they’re seeing affects the whole system or just them, and they reinforce the idea of transparency. So I urge you to consider making some stats of this type public where that’s practical.

I look forward to reading the incident report in due course.


#48

These are some good ideas, thanks!


#49

Well, the stats page did have a lot of more graphs and figures (including OSCP age), but for some reason, most of them disappeared…


#50

Thanks for your help. My website is www.tablix.org. Later yesterday I renewed my certificate, as you suggested. That solved the issue as far as I can see.

By the way, I said earlier that Apache was not stapling outdated OCSP responses in responses to clients. It’s possible that was not true. It seems that openssl s_client always says OCSP response: no response sent (at least in version 1.0.1 I was using). For instance, if I check now, s_client still says no OCSP response from server, while ssllabs.com server test says that OCSP stapling is enabled and working fine.


#51

Your website may serve different certs to browsers and CLI tools:

$ openssl s_client -connect www.tablix.org:443 | openssl x509 -text

Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number: 0 (0x0)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = default
        Validity
            Not Before: Jan  1 00:00:00 0 GMT
            Not After : Dec 31 23:59:59 9999 GMT
        Subject: CN = default

#52

This seems to be due to SNI. Adding -servername www.tablix.org to the s_connect command line gives me a valid OCSP response.


#53

Ah, thanks for pointing this out. Yes, it’s SNI. I forgot about -servername.


#54

In case anyone else has been thinking about how to monitor OCSP stapling to avoid being caught off-guard again, I’ve created a Nagios/Icinga plugin for this purpose. You can find it on GitHub or on Icinga’s (Plugin) Exchange.

Rather than querying the CA’s OCSP server directly, this actually monitors the (cached) OCSP response a TLS server sends, so this should catch issues where your server is unable to update the OCSP response as well. More details if you follow one of the links above.

Hope this is useful to some. Happy about any feedback on things I might have missed or that could be improved (perhaps on GitHub or the plugin page, as to not further hijack this thread :blush:).


#55

I made a similar quick and dirty Munin plugin for checking stapled responses. It is a bit cruder than yours and doesn’t define any warning conditions at the moment since I was unsure how to check for expired responses.

I see you just check if nextUpdate is in the past. RFC also says

The time at which the status being indicated is known to be correct (thisUpdate) is sufficiently recent.

which is a bit vague.


#56

Any news on this perhaps? In the past there were also preliminary reports for incidents on the same day of the incident. It has been 5 days and counting since this incident at the moment. :disappointed:


#57

Incident report is now posted: Expired OCSP responses, December 12. Thanks for checking in, @Osiris!


#58

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.