I thought it was optional, that if I don’t put it, the resolver will be using server’s resolver, I’m using Digital Ocean that using Google’s DNS as its default resolver (220.127.116.11 and 18.104.22.168).
I’ll add the resolver now, I hope the error stop.
It’s kind of weird, on Qualys and using openssl to check ssl stapling, all is ok, it’s just that this error keep appearing, althought the frequency is pretty low compared to successfull connection or unique visitor, less than 1%.
Just rechecked using openssl, I just realised there’s error at the bottom of the result, nothing changes before and after adding the resolver
OCSP Response Status: successful (0x0)
Response Type: Basic OCSP Response
Version: 1 (0x0)
Responder Id: C = US, O = Let's Encrypt, CN = Let's Encrypt Authority X3
Produced At: Dec 6 18:08:00 2017 GMT
Hash Algorithm: sha1
Issuer Name Hash: 7EE66AE7729AB3FCF8A220646C16A12D6071085D
Issuer Key Hash: A84A6A63047DDDBAE6D139B7A64565EFF3A8ECA1
Serial Number: 03C7298FE81B91311715737D314A2041E23B
Cert Status: good
This Update: Dec 6 18:00:00 2017 GMT
Next Update: Dec 13 18:00:00 2017 GMT
Signature Algorithm: sha256WithRSAEncryption
Error opening validator certificate issuer.pem
3073283776:error:02001002:system library:fopen:No such file or directory:bss_file.c:175:fopen('issuer.pem','r')
3073283776:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:178:
check at the bottom, Error opening validator certificate issuer.pem
Did you only get these messages on December 5th? The OCSP server for the DST Root CA that has signed the Let’s Encrypt Intermediates was down that day. If you don’t have any messages other than that day you don’t have anything to worry about since it seems to be working now.
root@dedaunan:~# openssl ocsp -no_nonce \
> -header Host ocsp.int-x3.letsencrypt.org \
> -url http://ocsp.int-x3.letsencrypt.org/ \
> -issuer /etc/letsencrypt/live/dedaunan.com/chain.pem \
> -CAfile /etc/letsencrypt/live/dedaunan.com/chain.pem \
> -verify_other /etc/letsencrypt/live/dedaunan.com/chain.pem \
> -cert /etc/letsencrypt/live/dedaunan.com/fullchain.pem
Response verify OK
This Update: Dec 6 18:00:00 2017 GMT
Next Update: Dec 13 18:00:00 2017 GMT
It seems everything is okay here
I started to use OCSP at Dec 5th, and looking at my old log files, the error started around that time too, and it keeps happening until now.
Before Dec 5th, I never get this error (started using let’s encrypt on early November)
Hmm, if it persists after the 5th it shouldn’t be related to that outage. Unless nginx cached something bad and a systemctl reload nginx clears it up?
I just sent a test request to your server and OCSP stapling seemed to work fine. My IP address is in the 72.208.*.* block. If you look in your error log, do you see this error message for a client request from an IP starting with that?
If you don’t, check you access log for a successful request from that IP address, then look back at the error log. Do you see other requests from other IP addresses failing with that error around the same time, even though my IP address didn’t appear to error?
I’m sure I did reload nginx several times after Dec 5th, I just reloaded it again now to make sure
Your request was successful, it’s on access log. The last error happened about 2 hours before your request. No new error until now (1 hour after your request)
This error is pretty rare, only happens at about 0.5%. Your request occured on off-peak hour (the lowest visitor hours). In a day I get 50k unique users (based of G Analytics), and this error happens about 200 times.
However if this error fixable, I want to fix it, I 'm afraid it will affect my SEO, especially as my traffic has been decreasing for months.
Searching on Google I see a few similar error as mine, but no solution on it yet.
So only some OCSP requests are failing. If you want to know why, you could capture some network traffic with a tool like tshark and look at the outbound OCSP requests that are occurring when the errors start flowing in.
Or you could work around the issue by fetching the OCSP responses yourself in a cronjob and providing them to nginx via ssl_stapling_file:
Many high traffic sites do this since the built-in stapling support in Apache and Nginx has issues.
But it’s possible that requests from your cronjob will fail at times just like nginx’s built-in requests, so keep a close eye on your cron logs if you do this.
I have a few questions here, when OCSP stapling failed, isn’t the browser will fetch OCSP directly to issuer? (I’m not using OCSP must staple ON) But it seems this error causing the connection being rejected. Sometimes I can find IP address that get error on error log, but can’t find it on access log, so the connection didn’t make it.
Another question, if I’m using cronjob to fetch OCSP, isn’t it very hard to coincide the timing that the cronjob firing at the moment worker process refreshed? I think most of the time when cronjob firing, the nginx workers already have OCSP staple cached. Unless the cronjob also reload nginx too? But is it okay to reload nginx every 30 minutes?
I’m starting to think to use CloudFlare, this is getting too complicated…
While nginx may wait a second to get the OCSP response it shouldn't fail requests just because it can't fetch OCSP information. This behavior makes me think your server is having some Internet connectivity issues. e.g. the OCSP errors are a hint towards bigger problems.
If you use ssl_stapling_file nginx will cease to retrieve and cache responses from OCSP automatically and instead just use the response you to provide to it. The cronjob must reload nginx when it retrieves a new response. It's safe to do this because when you reload nginx instead of restarting it it keeps listening with the old configuration until it gets a chance to safely reload without dropping connections.
Responses from the Let's Encrypt OCSP servers are valid for a week, so it shouldn't be necessary to run it every half-hour. I'd just do it once or twice per day.
I just read Wikipedia about this, shame Google Chrome won’t retrieve OCSP
While nginx may wait a second to get the OCSP response it shouldn’t fail requests just because it can’t fetch OCSP information. This behavior makes me think your server is having some Internet connectivity issues. e.g. the OCSP errors are a hint towards bigger problems.
That could be true, but I’m thinking of about when there’s incoming connection and nginx don’t have OCSP cached, so the browsers have to fetch it on its own but they don’t, so the connection failed.
However my opinion can make sense if nginx automatically respawn process worker on every XX connection, so each time it’s respawned, it will have empty OCSP cache. But if nginx process worker only respawned/restarted on nginx reload, then this scenario fail. I can’t find information if process worker respawned after reaching certain count of connections.
If you use ssl_stapling_file nginx will cease to retrieve and cache responses from OCSP automatically and instead just use the response you to provide to it. The cronjob must reload nginx when it retrieves a new response. It’s safe to do this because when you reload nginx instead of restarting it it keeps listening with the old configuration until it gets a chance to safely reload without dropping connections.
Responses from the Let’s Encrypt OCSP servers are valid for a week, so it shouldn’t be necessary to run it every half-hour. I’d just do it once or twice per day.
This is a really great idea. Every request will be served with OCSP staple.
OCSP response valid for a week, do they renewed a day before its expire? or renewed the moment it become expired?
Sorry for a lot of questions, I’m new about this kind of thing
The connection won't normally fail in that case. If the certificate isn't using must-staple, and Nginx's OCSP cache is empty, and the browser doesn't implement OCSP fetching, the browser will simply continue on without knowing or caring about the certificate's revocation status. (Unless the browser's CRLSets or equivalent feature know it's revoked anyway.)
Nginx doesn't respawn workers unless you reload or they crash.
I'm not sure of the timeline Let's Encrypt uses, but they certainly are renewed long before they expire. I'd guess it's along the lines of 1-3 days before.
Yes, that seems to be the reason for Chrome, also latency problem. Kind of weird that almost all other browsers fetch OCSP.
I see.. it's clear now that this problem isn't because OCSP, but implementing OCSP somehow showing this underlying problem.
Thanks, this clear it. As far as I know from error log and any other logs on my VPS there's no crash on my Nginx that might causing it to reload on its own. However this is default nginx from repo, so I can't debug it.
Then the cronjob is good choice. If I'm not using cronjob to fetch it earlier, is it possible for my nginx to serve stale OCSP (when there's no outage on LE server)? will nginx automatically purge OCSP at cache and fetch a new one when it become stale?
Is this ssl_stapling_verify on used to make sure that OCSP on my nginx cache always fresh?