SSL stapling OCSP error

somuch77 · December 8, 2017, 3:28pm

The moment I enabled OCSP on my server I got a lot of this

2017/12/05 05:52:34 [crit] 28850#28850: *1401605 SSL_do_handshake() failed (SSL: error:14094459:SSL routines:ssl3_read_bytes:tlsv1 bad certificate status response:SSL alert number 113) while SSL handshaking, client: HIDDEN IP ADDRESS, server: 0.0.0.0:443
2017/12/05 06:10:50 [crit] 28850#28850: *1405851 SSL_do_handshake() failed (SSL: error:14094459:SSL routines:ssl3_read_bytes:tlsv1 bad certificate status response:SSL alert number 113) while SSL handshaking, client: HIDDEN IP ADDRESS, server: 0.0.0.0:443
2017/12/05 06:10:52 [crit] 28850#28850: *1405878 SSL_do_handshake() failed (SSL: error:14094459:SSL routines:ssl3_read_bytes:tlsv1 bad certificate status response:SSL alert number 113) while SSL handshaking, client: HIDDEN IP ADDRESS, server: 0.0.0.0:443

It keps going until now. What’s wrong?

Here’s my block for enabling OCSP

ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/letsencrypt/live/example.com/chain.pem;

I put it on www version of https and non-www version of https
www version is redirected to non-www version

My question:

Am I doing it wrong? Or maybe because LE fault?
Is this mean the client’s connection not going through? rejected?

Thanks

Patches · December 9, 2017, 4:52am

Do you have resolver defined with your DNS resolvers? It’s required for SSL stapling.

Your client connections are not being rejected but may be delayed while nginx retries fetching the OCSP response.

somuch77 · December 9, 2017, 7:18am

I thought it was optional, that if I don’t put it, the resolver will be using server’s resolver, I’m using Digital Ocean that using Google’s DNS as its default resolver (8.8.8.8 and 8.8.4.4).

I’ll add the resolver now, I hope the error stop.

It’s kind of weird, on Qualys and using openssl to check ssl stapling, all is ok, it’s just that this error keep appearing, althought the frequency is pretty low compared to successfull connection or unique visitor, less than 1%.

EDIT
Just rechecked using openssl, I just realised there’s error at the bottom of the result, nothing changes before and after adding the resolver

OCSP Response Status: successful (0x0)
Response Type: Basic OCSP Response
Version: 1 (0x0)
Responder Id: C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
Produced At: Dec  6 18:08:00 2017 GMT
Responses:
Certificate ID:
  Hash Algorithm: sha1
  Issuer Name Hash: 7EE66AE7729AB3FCF8A220646C16A12D6071085D
  Issuer Key Hash: A84A6A63047DDDBAE6D139B7A64565EFF3A8ECA1
  Serial Number: 03C7298FE81B91311715737D314A2041E23B
Cert Status: good
This Update: Dec  6 18:00:00 2017 GMT
Next Update: Dec 13 18:00:00 2017 GMT

Signature Algorithm: sha256WithRSAEncryption
     40:8c:ec:f8:df:51:e9:44:27:a9:22:9d:5b:50:95:49:f1:64:
     e9:d1:20:1f:ca:0a:0e:db:8b:95:5b:18:60:8d:63:7c:43:03:
     e1:86:98:76:55:04:d8:5a:21:10:db:6a:1f:7e:fb:30:9f:77:
     ca:8a:2c:ed:86:c7:4d:9a:87:42:63:14:8f:11:87:fd:b5:18:
     1e:95:9b:54:db:19:0a:bd:35:15:8f:0d:76:35:05:4d:fc:a3:
     53:ea:35:57:6a:fa:65:04:49:97:d5:16:fe:2a:b7:a4:93:35:
     d7:ee:b1:3e:b0:31:6b:e2:29:c7:36:ab:15:f6:cb:0a:af:e6:
     5a:6a:4a:90:ef:24:f0:89:e6:43:65:e2:6a:13:1f:61:29:bb:
     62:c6:ad:09:54:b1:37:80:bc:d1:3e:6f:bc:af:f4:78:22:ac:
     e9:91:33:bf:a4:6d:a1:3f:70:fc:30:68:d4:f4:5e:75:56:d6:
     0b:10:54:1d:7d:29:b7:fd:ba:61:6c:1d:30:35:e4:b0:fb:27:
     7c:c3:c5:43:66:f9:2c:6f:83:46:53:12:fb:5f:1a:91:56:e7:
     f3:0f:ba:b4:aa:e2:14:19:1f:00:4d:37:17:49:13:75:21:2c:
     c1:7b:ab:a2:ea:25:17:4c:f5:0c:35:da:48:63:c9:f3:6c:13:
     6b:5d:44:42
Error opening validator certificate issuer.pem
3073283776:error:02001002:system library:fopen:No such file or directory:bss_file.c:175:fopen('issuer.pem','r')
3073283776:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:178:

check at the bottom, Error opening validator certificate issuer.pem

Patches · December 9, 2017, 10:09am

openssl is trying to use a certificate in a file issuer.pem to verify the OCSP response, but can’t find one. I think you need to use chain.pem instead, e.g.

openssl ocsp -no_nonce \
-header Host ocsp.int-x3.letsencrypt.org \
-url http://ocsp.int-x3.letsencrypt.org/ \
-issuer chain.pem \
-CAfile chain.pem \
-verify_other chain.pem \
-cert cert.pem

Did you only get these messages on December 5th? The OCSP server for the DST Root CA that has signed the Let’s Encrypt Intermediates was down that day. If you don’t have any messages other than that day you don’t have anything to worry about since it seems to be working now.

somuch77 · December 9, 2017, 11:50am

I got this

root@dedaunan:~# openssl ocsp -no_nonce \
> -header Host ocsp.int-x3.letsencrypt.org \
> -url http://ocsp.int-x3.letsencrypt.org/ \
> -issuer /etc/letsencrypt/live/dedaunan.com/chain.pem \
> -CAfile /etc/letsencrypt/live/dedaunan.com/chain.pem \
> -verify_other /etc/letsencrypt/live/dedaunan.com/chain.pem \
> -cert /etc/letsencrypt/live/dedaunan.com/fullchain.pem
Response verify OK
/etc/letsencrypt/live/dedaunan.com/fullchain.pem: good
        This Update: Dec  6 18:00:00 2017 GMT
        Next Update: Dec 13 18:00:00 2017 GMT

It seems everything is okay here

I started to use OCSP at Dec 5th, and looking at my old log files, the error started around that time too, and it keeps happening until now.
Before Dec 5th, I never get this error (started using let’s encrypt on early November)

Here’s my server block if you want to take a look https://pastebin.com/0VCf8BWW

Thanks

Patches · December 9, 2017, 6:12pm

Hmm, if it persists after the 5th it shouldn’t be related to that outage. Unless nginx cached something bad and a systemctl reload nginx clears it up?

I just sent a test request to your server and OCSP stapling seemed to work fine. My IP address is in the 72.208.*.* block. If you look in your error log, do you see this error message for a client request from an IP starting with that?

If you don’t, check you access log for a successful request from that IP address, then look back at the error log. Do you see other requests from other IP addresses failing with that error around the same time, even though my IP address didn’t appear to error?

somuch77 · December 9, 2017, 7:45pm

I’m sure I did reload nginx several times after Dec 5th, I just reloaded it again now to make sure

Your request was successful, it’s on access log. The last error happened about 2 hours before your request. No new error until now (1 hour after your request)

This error is pretty rare, only happens at about 0.5%. Your request occured on off-peak hour (the lowest visitor hours). In a day I get 50k unique users (based of G Analytics), and this error happens about 200 times.
However if this error fixable, I want to fix it, I 'm afraid it will affect my SEO, especially as my traffic has been decreasing for months.

Searching on Google I see a few similar error as mine, but no solution on it yet.

Thanks

Patches · December 9, 2017, 10:15pm

So only some OCSP requests are failing. If you want to know why, you could capture some network traffic with a tool like tshark and look at the outbound OCSP requests that are occurring when the errors start flowing in.

Or you could work around the issue by fetching the OCSP responses yourself in a cronjob and providing them to nginx via ssl_stapling_file:

http://unmitigatedrisk.com/?p=241

Many high traffic sites do this since the built-in stapling support in Apache and Nginx has issues.
But it’s possible that requests from your cronjob will fail at times just like nginx’s built-in requests, so keep a close eye on your cron logs if you do this.

somuch77 · December 10, 2017, 2:43pm

I have a few questions here, when OCSP stapling failed, isn’t the browser will fetch OCSP directly to issuer? (I’m not using OCSP must staple ON) But it seems this error causing the connection being rejected. Sometimes I can find IP address that get error on error log, but can’t find it on access log, so the connection didn’t make it.

Another question, if I’m using cronjob to fetch OCSP, isn’t it very hard to coincide the timing that the cronjob firing at the moment worker process refreshed? I think most of the time when cronjob firing, the nginx workers already have OCSP staple cached. Unless the cronjob also reload nginx too? But is it okay to reload nginx every 30 minutes?

I’m starting to think to use CloudFlare, this is getting too complicated…

Patches · December 10, 2017, 6:48pm

Yes, if it fetches OCSP. (Many browsers do not.)

While nginx may wait a second to get the OCSP response it shouldn't fail requests just because it can't fetch OCSP information. This behavior makes me think your server is having some Internet connectivity issues. e.g. the OCSP errors are a hint towards bigger problems.

If you use ssl_stapling_file nginx will cease to retrieve and cache responses from OCSP automatically and instead just use the response you to provide to it. The cronjob must reload nginx when it retrieves a new response. It's safe to do this because when you reload nginx instead of restarting it it keeps listening with the old configuration until it gets a chance to safely reload without dropping connections.

Responses from the Let's Encrypt OCSP servers are valid for a week, so it shouldn't be necessary to run it every half-hour. I'd just do it once or twice per day.

somuch77 · December 11, 2017, 9:39am

Yes, if it fetches OCSP. (Many browsers do not.)

I just read Wikipedia about this, shame Google Chrome won't retrieve OCSP

While nginx may wait a second to get the OCSP response it shouldn’t fail requests just because it can’t fetch OCSP information. This behavior makes me think your server is having some Internet connectivity issues. e.g. the OCSP errors are a hint towards bigger problems.

That could be true, but I'm thinking of about when there's incoming connection and nginx don't have OCSP cached, so the browsers have to fetch it on its own but they don't, so the connection failed.

However my opinion can make sense if nginx automatically respawn process worker on every XX connection, so each time it's respawned, it will have empty OCSP cache. But if nginx process worker only respawned/restarted on nginx reload, then this scenario fail. I can't find information if process worker respawned after reaching certain count of connections.

If you use ssl_stapling_file nginx will cease to retrieve and cache responses from OCSP automatically and instead just use the response you to provide to it. The cronjob must reload nginx when it retrieves a new response. It’s safe to do this because when you reload nginx instead of restarting it it keeps listening with the old configuration until it gets a chance to safely reload without dropping connections.

Responses from the Let’s Encrypt OCSP servers are valid for a week, so it shouldn’t be necessary to run it every half-hour. I’d just do it once or twice per day.

This is a really great idea. Every request will be served with OCSP staple.

OCSP response valid for a week, do they renewed a day before its expire? or renewed the moment it become expired?

Sorry for a lot of questions, I'm new about this kind of thing
Thanks

Osiris · December 11, 2017, 9:41am

OCSP fetching by the browser itself has privacy implications.

mnordhoff · December 11, 2017, 10:10am

The connection won't normally fail in that case. If the certificate isn't using must-staple, and Nginx's OCSP cache is empty, and the browser doesn't implement OCSP fetching, the browser will simply continue on without knowing or caring about the certificate's revocation status. (Unless the browser's CRLSets or equivalent feature know it's revoked anyway.)

Nginx doesn't respawn workers unless you reload or they crash.

I'm not sure of the timeline Let's Encrypt uses, but they certainly are renewed long before they expire. I'd guess it's along the lines of 1-3 days before.

somuch77 · December 11, 2017, 12:51pm

Yes, that seems to be the reason for Chrome, also latency problem. Kind of weird that almost all other browsers fetch OCSP.

I see.. it's clear now that this problem isn't because OCSP, but implementing OCSP somehow showing this underlying problem.

Thanks, this clear it. As far as I know from error log and any other logs on my VPS there's no crash on my Nginx that might causing it to reload on its own. However this is default nginx from repo, so I can't debug it.

Then the cronjob is good choice. If I'm not using cronjob to fetch it earlier, is it possible for my nginx to serve stale OCSP (when there's no outage on LE server)? will nginx automatically purge OCSP at cache and fetch a new one when it become stale?

Is this ssl_stapling_verify on used to make sure that OCSP on my nginx cache always fresh?

Thanks

mnordhoff · December 11, 2017, 1:42pm

I'm not sure. I'm almost certain Nginx will refresh expired OCSP responses whether or not ssl_stapling_verify is on. It would be really broken and people would complain all the time otherwise.

system · January 10, 2018, 1:42pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ssl_stapling error nginx Server	2	5765	April 1, 2018
Nginx ssl_stapling does not work Help	8	1325	August 5, 2020
Howto: OCSP Stapling for NGINX Server	8	38714	May 3, 2016
SSL Handshaking Help	4	1110	October 26, 2019
OCSP failed (111: Connection refused) while requesting certificate status Help	4	2283	November 2, 2019

SSL stapling OCSP error

Related topics