Around 2016-12-17 0:30AM UTC I have encountered the following OCSP error in Firefox (FF 50.1.0 on macOS Sierra):
Secure Connection Failed
An error occurred during a connection to new.example.com. The OCSP server has refused this request as unauthorized. Error code: SEC_ERROR_OCSP_UNAUTHORIZED_REQUEST
The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
Please contact the website owners to inform them of this problem.
This happened after every Apache 2.4 reload (graceful restart) upon the first page request of e.g. https://ssltest.virtual-host.ch/
It also happened sporadically on subsequent requests of the same URL. Qualys SSL Test reported the following OCSP ERROR:
The morning after, the problem did not show up again and Qualys SSL Test reported OCSP: http://ocsp.int-x3.letsencrypt.org/ OK status.
Was this due to an outage of ocsp.int-x3.letsencrypt.org? How can I configure Apache to gracefully handle that without any service interruption of any Let’s Encrypt protected sites?
Thanks for reporting - I'll see what I can figure out from our end. It seems as though the problem has since resolved itself, refreshing the cache of the SSLLabs test is showing an OK OCSP response for your configuration now.
I'm not experienced with using Apache and OCSP stapling but you might find this response from a similar thread helpful. It sounds like Apache's OCSP configuration can be suboptimal.
Hi @cpu,
Thanks for your help. According to @bjacke’s recommendation I switched to the following OCSP configuration in Apache:
SSLStaplingStandardCacheTimeout 172800
SSLStaplingErrorCacheTimeout 60
SSLStaplingResponderTimeout 4
SSLStaplingReturnResponderErrors Off
Hopefully, that problem will not show up again. Actually the only real difference to my previous configuration is SSLStaplingErrorCacheTimeout 60 which was set to 300 (5mins) before.
I would still be interested if there were any downtimes of ocsp.int-x3.letsencrypt.org during the 2016-12-17 0:00 - 0:30AM UTC timespan. My OCSP requests come from web.onlime.ch and at that time I was testing with the following LE-enabled host: new.i-pad.ch (not ssltest.virtual-host.ch) - which showed the OCSP ERROR: Request failed with OCSP status: 6 at SSLLabs test during more than 10 minutes, while always refreshing the SSLLabs test cache. I’ve went to bed afterwards and can’t tell, when it cleared up - in the morning all was working again.
I’ve asked our operations team to look into whether we might have had an OCSP responder outage during that period. With Akamai in front of our OCSP responders it can sometimes be tricky to diagnose these sorts of issues, there could be a geographic element involved based on which caches you hit.
I’ll let you know once I’ve heard back from our ops team.
Having talked with our ops team I think we have an explanation for the behaviour you observed. We were able to confirm that we didn’t sustain any kind of OCSP outage during the period you were experiencing trouble. Our response levels were normal and there was nothing errant in logs/monitoring.
We believe that your issue was a separate corner case. Our OCSP responder can return the SEC_ERROR_OCSP_UNAUTHORIZED_REQUEST response you saw when the certificate doesn’t yet have an initial OCSP response generated. Under normal circumstances we write the initial OCSP response for a new certificate immediately at the time of issuance, but under load this can sometimes take a little while to catch up. Friday/Sat we were under unusually high, but not outage level load. When our origin returns the SEC_ERROR_OCSP_UNAUTHORIZED_REQUEST response to a client it is cached by Akamai for ~12 hours.
Normally our ocsp-updater system will purge this cache when it finally writes the initial response & everything will immediately start getting the fresh response. We had recently identified issues with this purging feature that we’re in the process of addressing so that this won’t happen in the new year.
Your certificate was issued Friday, December 16, 2016 at 6:02:00 PM, and you were experiencing the forbidden issue at Sat Dec 16th, 0:30 AM which would fall within the 12 hours I’d expect a forbidden response to be cached without anything purging it on our end. It seems like the most likely explanation is that you were caught up in this unfortunate corner case & Apache handles it particularly ungracefully.
Hope this helps. When we have the akamai purging fixes deployed to the ocsp-updater this shouldn’t happen again. Thanks for understanding!
Wow! What a fast response time and what a great explanation for this rather complex corner case.
I thought this issue will hinder me going public with Let’s Encrypt certificates at Onlime. But now, after this brilliant support experience, nothing pulls me back going live rather earlier than later.