Unsuccessful OSCP response while requesting certificate status

I’ve configured Let’s Encrypt correctly on my high-traffic webserver. It’s been working without a hitch since January this year.

Around 16 hours ago, most of my users suddenly started getting “intermittent” browser warnings that my website’s HTTPS certificate is void. Various SSL checkers confirmed these errors.

nginx error.log shows me errors of the following sort (flood of multiple occurrences every minute):

2020/06/02 10:05:02 [error] 10215#10215: OCSP response not successful (6: unauthorized) while requesting certificate status, responder: ocsp.int-x3.letsencrypt.org, peer: 23.52.171.104:80, certificate: "/etc/letsencrypt/live/lymlyte.com/fullchain.pem"

Currently, I see on https://letsencrypt.status.io/ that planned maintenance work is going on, and that intermediate OSCP responders ocsp.int-x{1..4}.letsencrypt.org were affected for a few minutes during this work.

Then why do I continue to get errors? If the reason is some kind of CDN cache, how long do I wait to get out of this? Is there anything I can do to hasten the process?

It’s badly affecting my users. Naturally there’s pressure on me to use a different SSL solution. Can someone provide me clarity on what is going on, and what to expect?


Other information:

When I run certbot certificates on my server, I get:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Found the following certs:
  Certificate Name: lymlyte.com
    Domains: lymlyte.com www.lymlyte.com
    Expiry Date: 2020-08-03 22:54:47+00:00 (VALID: 62 days)
    Certificate Path: /etc/letsencrypt/live/lymlyte.com/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/lymlyte.com/privkey.pem
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

In short, it seems to be correctly setup. In case it matters, the location /etc/letsencrypt/archive/lymlyte.com contains:

-rw-r--r-- 1 root root 1.9K Jan 19 20:09 cert1.pem
-rw-r--r-- 1 root root 2.0K Jan 31 04:04 cert2.pem
-rw-r--r-- 1 root root 2.0K Mar  3 19:17 cert3.pem
-rw-r--r-- 1 root root 2.0K May  5 23:54 cert4.pem
-rw-r--r-- 1 root root 1.7K Jan 19 20:09 chain1.pem
-rw-r--r-- 1 root root 1.7K Jan 31 04:04 chain2.pem
-rw-r--r-- 1 root root 1.7K Mar  3 19:17 chain3.pem
-rw-r--r-- 1 root root 1.7K May  5 23:54 chain4.pem
-rw-r--r-- 1 root root 3.5K Jan 19 20:09 fullchain1.pem
-rw-r--r-- 1 root root 3.6K Jan 31 04:04 fullchain2.pem
-rw-r--r-- 1 root root 3.6K Mar  3 19:17 fullchain3.pem
-rw-r--r-- 1 root root 3.6K May  5 23:54 fullchain4.pem
-rw------- 1 root root 1.7K Jan 19 20:09 privkey1.pem
-rw------- 1 root root 1.7K Jan 31 04:04 privkey2.pem
-rw------- 1 root root 1.7K Mar  3 19:17 privkey3.pem
-rw------- 1 root root 1.7K May  5 23:54 privkey4.pem

The one strange this is I can’t seem to run certbot renew --force-renewal. This gives me an unauthorized error of type invalid response. However, my certs should be good for another 60+ days in any case - so I’m choosing to ignore this one. Could it be that this is the actual problem?

I don’t think the OCSP maintenance is related to this issue at all.

What I believe is happening is that you have orphaned nginx processes on your server.

Basically: some nginx workers are using an old version of your configuration, some are using a new version of your configuration. I’m not too sure how it happens, but very rarely it does.

How does this relate to SSL? It means some of the nginx workers are using your renewed certificate, and some are using the old certificate, which expired ~2 days ago. That lines up with this:

and it also lines up with the OCSP unauthorized response. Unauthorized is the response produced when an OCSP query is made for an expired certificate.

Why do I think you have orphaned nginx processes? Because when I connect to your server, I randomly see the wrong certificate.

$ openssl s_client -connect lymlyte.com:443 -showcerts 2>/dev/null | openssl x509 -noout -dates
notBefore=May  5 22:54:47 2020 GMT
notAfter=Aug  3 22:54:47 2020 GMT

$ openssl s_client -connect lymlyte.com:443 -showcerts 2>/dev/null | openssl x509 -noout -dates
notBefore=May  5 22:54:47 2020 GMT
notAfter=Aug  3 22:54:47 2020 GMT

$ openssl s_client -connect lymlyte.com:443 -showcerts 2>/dev/null | openssl x509 -noout -dates
notBefore=Mar  3 18:17:01 2020 GMT
notAfter=Jun  1 18:17:01 2020 GMT

$ openssl s_client -connect lymlyte.com:443 -showcerts 2>/dev/null | openssl x509 -noout -dates
notBefore=May  5 22:54:47 2020 GMT
notAfter=Aug  3 22:54:47 2020 GMT

Look at the dates on that second last connection.

To fix this: kill all your nginx processes. Make sure they are all really dead. Then restart nginx.

systemctl stop nginx
killall -9 nginx
ps aux | grep nginx
# Verify nothing came up in the grep, and then
systemctl start nginx

Besides orphaned nginx workers, the other possibility for how this could have occurred is if your nginx configuration has the same virtual host configured twice, and one of the configurations refers to an expired certificate.

But I think orphaned workers is more likely at this point.

2 Likes

To follow-up, you mean when I do

ps aux | grep nginx

No worker or master processes should show up?

p.s. this is highly odd. I’ve never seen anything like this (if this is indeed the case). I know you said you don’t know why it happens, but in case you have any hunches (or I can read about it happening elsewhere), my inquisitive mind is all ears.

Yes, no workers or masters should show up - since you forcibly killed all of them in the previous 2 steps.

I’ve only ever seen it happen to like 2 other users on this forum, so there might be some rare timing bug with the way Certbot reloads nginx, or in nginx’s systemd unit.

Well I gave it a shot. Now just waiting for a few to ensure it worked…

Looks good to me, 20 connects in a row and I didn’t see the old certificate.

Might be worth signing up to uptimerobot.com’s free plan or something, they’ll warn you if they see an expiring certificate, which would warn you if the issue comes back. Oops, apparently only on the paid plan.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.