I’ve looked for the answer for the question below, buth did not find any. Hopfully it is not a question that is asked many times before.
I’am using OCSP Stapling. Meanly because of performance, but also because it improves privacy. With the outage of today I understand that in particulair users who are using OCSP stapling where impacted.
What I want to understand is why user who are using OCSP stapling are impacted more by the outage of today. As far as I’am concerned, with OCSP stapling there is a “cached” response from my server to the client, proving the certificate is not revoked. When the Let’s Encrypt OCSP servers are down, my server can still prove that the certificate is not reinvoked with the “cached” OCSP response. Without OCSP stapling I would think the outage of Lets Encrypt OCSP servers would have more impact, because they can not respons to the OCSP querys from web browsers.
Hope that someone is able to help met understand the working of OCSP stapling better.
In theory, yes, stapled OCSP should mean that your server can continue to serve a stapled response up to ~7 days (the lifetime of the OCSP response), and be robust to OCSP responder outages. In practice, however, there are a couple of factors that interfere. The main one is that when browsers make their own OCSP requests, they generally “fail open” if the responder is down, and no one noticed. But Firefox intentionally treats stapled OCSP more strictly, because they are trying to prepare for a world where Must-Staple is more common. So if a responder is serving bad responses, and the web server staples them, Firefox will show certificate errors, even if it wouldn’t have shown those certificate errors getting the same bad response from the OCSP responder directly.
Web servers could mitigate this by caching OCSP responses long-term and not updating the cache until they get a good response from the OCSP server. This would mean that outages shorter than a few days could be weathered nicely by those servers. However, the most popular servers don’t do this.
@jsha thanks for your comment!
Should we disable OCSP stapling for now so we won’t get affected with this outage?
Depending on your web server, you might have the option to manage your OCSP responses, rather than relying on your web server’s (typically rather bad) implementation. Nginx supports this via the
ssl_stapling_file directive, for example. It could work roughly like this:
- Have a bash script that uses the
openssl ocsp command to request and store the OCSP response for your certificate
- If the OCSP response is valid and has changed, gracefully reload nginx
- Add a daily cronjob for this bash script
This would cover any outage that doesn’t last more than a couple of days. Here’s an article with some good instructions on how to use this.
If you’re willing to switch your web server software, Caddy has a high-quality OCSP implementation that would’ve survived this outage too.
@Neutralizer: Are your end-users currently experiencing an outage due to OCSP stapling? If so, definitely disable for now, and let us know. Medium-term, what @pfg suggests is also a good possibility. Ideally, long-term, there should be an off-the-shelf solution that implements this more robust stapling for you.
We don’t experience an issue at this moment but I would like to take action BEFORE we experience any problem.
So, my understanding is: disabling OCSP going to lower SSL handshake performance… but we are not going to have any problem related to this outage, is that correct?
Please kindly clarify.
yup as long as you don't need to issue or renew new letsencrypt ssl certs
What do you mean by that? Why OCSP affects certificate renewals?
Just to clarify: I know that there is a outage and I cannot generate and renew certificates. This is understandable.
I am specifically asking about OCSP.
That's basically correct.
sorry was referring specifically this current outage
Nginx has the option “ssl_stapling_verify”. This option is documented as :
Enables or disables verification of OCSP responses by the server.
For verification to work, the certificate of the server certificate issuer, the root certificate, and all intermediate certificates should be configured as trusted using the ssl_trusted_certificate directive.
It look likes it should prevent malformed responses to client’s. But how it works, when verification fails, is not mentioned.
There was an HN thread today in response to the outage where people proposed several Apache settings which they claim improve Apache’s stapling behavior with respect to upstream OCSP outages:
I have to say that I find the Apache defaults described here rather bizarre; so far I haven’t been able to to imagine a use case for site using stapling that would make the defaults intuitive and helpful.
Maybe someone who uses stapling with Apache can confirm whether the suggestions in the thread are beneficial in case of outages.
No - but you should allow for service outages
My suggestion would be for 3 days OCSP staples which you can specify here:
If a service is down for a day and unless you are really unlucky you should outlive the time needed for the service to get back up by using the locally cached staple
So in fact if used correctly OCSP Stapling can increase your up-time because if downstream services (CA OCSP services fail) you will still be able to field OCSP requests
The down side to this is if the certificate is revoked you may want to re-run the OCSP staple process manually to update to make sure the revoked certificate is not been treated as valid by your web server
We wrote this tool for the same purpose https://github.com/greenhost/ocspd, two of the main goals are to have a more robust and performant solution than a bash script. It was specifically designed to work with HAProxy, it communicates with HAProxy’s socket so it doesn’t have to reload when it fetches new staples at all. But it can be used to fetch staples for any webserver that can serve cached staple files.
It’s still beta so please report any issue so we can improve it.
This looks great, thanks! I’ll definitely give it a try, the bash scripts I’m currently using are rather hacky.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.