OCSP Request failed with following message

griesi009 · January 22, 2018, 12:18pm

Since 13.01.2018 12:59 our Webserver logs (NGINX Server in Germany) are also beeing floaded with
OCSP responder sent invalid “Content-Type” header: “text/html” while requesting certificate status, responder: ocsp.int-x3.letsencrypt.org, peer: 2.16.186.27:80

manu · January 22, 2018, 12:32pm

In case anyone is wondering about a temporary and hacky solution, I’ve added to /etc/hosts:

63.243.228.17   ocsp.int-x3.letsencrypt.org

So far, I’ve no errors since 3 days. We need to follow this up (and remove this dirty fix when things go back to normal), so indeed, as K.A.B mentionned, it would be really nice to have some visibility on https://letsencrypt.status.io/

dogsbody · January 22, 2018, 5:24pm

It’s safe to say that akamai’s servers are still not OK

# cd /etc/letsencrypt/live/status.dogsbody.com
# openssl ocsp -issuer chain.pem -cert cert.pem -text -url http://ocsp.int-x3.letsencrypt.org
OCSP Request Data:
    Version: 1 (0x0)
    Requestor List:
        Certificate ID:
          Hash Algorithm: sha1
          Issuer Name Hash: 7EE66AE7729AB3FCF8A220646C16A12D6071085D
          Issuer Key Hash: A84A6A63047DDDBAE6D139B7A64565EFF3A8ECA1
          Serial Number: 0356E5FA189305B3BA1BC2C8D13D993D83F8
    Request Extensions:
        OCSP Nonce: 
            041047D59827ABED1246FAB062C7E3A7C30F
Error querying OCSP responder
140474384783000:error:27076072:OCSP routines:PARSE_HTTP_LINE1:server response error:ocsp_ht.c:314:Code=400,Reason=Bad Request

A host lookup on ocsp.int-x3.letsencrypt.org from the affected server…

# host ocsp.int-x3.letsencrypt.org
ocsp.int-x3.letsencrypt.org is an alias for ocsp.int-x3.letsencrypt.org.edgesuite.net.
ocsp.int-x3.letsencrypt.org.edgesuite.net is an alias for a771.dscq.akamai.net.
a771.dscq.akamai.net has address 92.123.64.234
a771.dscq.akamai.net has address 92.123.64.201
a771.dscq.akamai.net has IPv6 address 2a02:26f0:e8::6856:6fb0
a771.dscq.akamai.net has IPv6 address 2a02:26f0:e8::6856:6f88

Please at least update your status page to show that this is an ongoing issue :-/

jungidee · January 23, 2018, 8:02am

We have set SSLUseStapling off in our apache config for now and the bad response from OCSP server: 503 Service Unavailable error in the apache logs is gone since then.

dogsbody · January 23, 2018, 9:29am

I have visited three websites in the last few days where Firefox wouldn’t allow me to access the site due to OCSP being unavailable. I have no control over they setup their servers :-/

chrisc · January 23, 2018, 10:43am

I’ve also been seeing a lot of these errors, an example from an Apache errors log from this morning:

[Tue Jan 23 10:08:05.314862 2018] [ssl:error] [pid 23183] AH01941: stapling_renew_response: responder error

I’m using the Mozilla recommended settings:

SSLUseStapling          on
SSLStaplingResponderTimeout 5
SSLStaplingReturnResponderErrors off
SSLStaplingCache        shmcb:${APACHE_RUN_DIR}/ocsp(128000)

Is there a way to improve this configuration in order to mitigate the current situation where the Let’s Encrypt OCSP servers are rather unreliable?

_az · January 23, 2018, 10:45am

Yes, you can turn stapling off in the interim to remove the server's dependency on the OCSP servers.

SSLUseStapling off

There may be some way to keep stapling on and tune it to deal with the errors better, but I am not sure that an acceptable configuration is possible with the way Apache currently works.

chrisc · January 23, 2018, 10:58am

Thanks @_az, if there is no better configuration then I guess disabling it on all our servers is the best option since the only other option would be to advise clients to disable it in their web browsers .

cpu · January 23, 2018, 5:57pm

Thanks for the suggestion - our operations team has opened a status page incident for this: Let's Encrypt Status

I'm hopeful someone will be able to update this thread with more information about the remediation discussions later today.

cc @isk @devnullisahappyplace

lbehm · January 23, 2018, 7:04pm

Just a thought:
If we disable OCSP stapling on our servers, we would only move the problem to our users.
The browsers/UserAgents would still try to fetch the OCSP response presumingly resulting in the same error.
(please correct me if browsers can handle this in a better fashion)

In one of my servers (doesn’t support stapling) I fetch the OCSP response manually in a file and provide that to my nginx. That process runs as a cron script every hour.

I had failures of SOME requests (not all) at least since 15.1.18. But also some months ago (22.09.17 08:23UTC).
So going with the hypothesis that this is just a load problem, I simply changed the execution-time of my cron script to not-so-defaulty-times, resulting in fewer failures (so far - or maybe you just changed something on your side).

Also: Please PN me if someone knows how browsers implement redundancy in ocsp requests, if an ocsp responder is dead or returns rubbish. Do they try multiple or all hosts behind the cname of the ocsp-uri in our certificates? What would happen if the ca specifies multiple ocsp-uris?

Patches · January 23, 2018, 10:59pm

@seanmavley Unfortunately nginx will cache DNS resolutions indefinitely after querying once. Restarting nginx would make it resolve a new IP address for the OCSP server, which may help things.

@cpu @isk @devnullisahappyplace I’m not sure if you’re aware of this nginx behavior, and people are experiencing this with other clients so I doubt stale IPs are responsible, but it could be exacerbating the situation…

Phil · January 23, 2018, 11:00pm

There’s actually been several issues that we have been chasing down. The problem affecting users in Central Europe has been solved by Akamai shutting down traffic from their Germany region at 1730 UTC and shifting traffic to their Italy region between 1800 and 1900 UTC. Since the region swap event, we have not received any origin connection failure alerts. A separate, but seemingly related, issue regarding OCSP responder timeouts has also been fixed.

Can you please confirm that you’re seeing successful OCSP lookups for your domains?

m.sanjay94 · January 24, 2018, 6:09am

I haven’t faced any issue in the last 36 hours in EU. However, I faced similar issue in US on ‘Jan 23 05:06 PST’ ( about 3 errors around the same time ).

ePhil · January 24, 2018, 8:03am

Sorry, but we are still seeing the 503s (based in Germany). On the bright side, I am no longer able to replicate this on the cli Maybe I did not try hard enough (a few hundret requests)?

At first I restarted nginx, then I started playing with the nameservers (originaly we where using 8.8.8.8).

Google currently resolves ocsp.int-x3.letsencrypt.org to

a771.dscq.akamai.net.
2.18.212.56
2.18.212.72

... while my local ISP gives me...

a771.dscq.akamai.net.
2.20.189.244
2.20.190.17

Tcpdump was able to get me some references for/from Akamai:
From: 2.20.190.17
Reference #102.27d4dd58.1516779467.46d5c8d
Reference #102.16d4dd58.1516779897.e7dfba
From: 2.20.189.244
Reference #102.16d4dd58.1516780015.e86d3e

Side Note 1: Browsing through the log, the times of the 503s seem to be "clustered" together.
Side Note 2: The "timeout thing" did not affect us, only 503.

Currently I am still hoping this might be a DNS issue but I seriously doubt it.

seanmavley · January 24, 2018, 8:12am

For my box in Amsterdan, the issue rears its ugly head only on weekends. So saturday sunday, will see how it goes.

The other box in London hasn't had a single issue, since this OCSP thing. Time will tell

seanmavley · January 24, 2018, 8:13am

That seems to be working for me, at least for the past 4 weeks. Everything comes back to normal immediately after nginx restart.

rleeden · January 24, 2018, 10:02am

Since switching SSLUseStapling back to on about 2 hours ago I’ve had one 503 in my logfiles:

[Wed Jan 24 09:38:40.506794 2018] [ssl:error] [pid 2322:tid 139904619071232] [client 192.168.0.3:28818] AH01980: bad response from OCSP server: 503 Service Unavailable

Too early to tell if that’s at the same frequency I was seeing the error messages in my logfiles as a few days ago when I last had stapling switched on.

Using manual openssl commands to check the response I’ve only got OK messages back over the last couple of hours. Whereas a few days ago I would get a 503 every 1 in approx every 8 or so attempts. So that has certainly improved.

isk · January 25, 2018, 7:35am

Thanks for the responses regarding the 503s. We are continuing to work with Akamai. We believe the two issues devnull mentioned would both manifest to end users as 503s from Akamai as in both cases Akamai believes they were unable to get a valid response from our origin servers.

@m.sanjay94 Right now we believe the errors you saw in the US were likely due to the responder timeouts issue, which is resolved. Everyone, please let us know if you are currently seeing these issues outside of Europe.

@seanmavley It is really strange to me that this would only happen on weekends. Would you mind posting here or in DM the IP that Akamai would see that traffic coming from?

@rleeden we’re definitely interested in the pattern of 503s you see. Would you mind posting here or in DM the IP that Akamai would see the traffic coming from?

m.sanjay94 · January 25, 2018, 7:41am

@isk I received the same response as I told earlier, in both US and EU about 5 hours back.

'An error occurred while processing your request.
Reference #102.27d4dd58.1516846269.6f87e2a'

I don't think these are timeouts which you have mentioned. Do you want my outgoing request's IP for processing the issuer further ?

isk · January 25, 2018, 7:45am

Yes, please send the Akamai IP you are getting as well as the IP you are coming from. That reference number is also helpful.

If you got the issue only 5 hours ago, it probably means Akamai is having a problem outside of Europe as well.

Topic		Replies	Views
OCSP server sending expired responses + stapling breaks Chrome Help	57	22928	January 17, 2017
SSL stapling OCSP error Help	15	7466	January 10, 2018
OCSP responder returning 503 errors Help	13	3539	July 2, 2020
Random OCSP timeouts Help	16	4659	May 3, 2019
OCSP error is taking down my site in firefox Help	19	14329	October 5, 2016

OCSP Request failed with following message

Related topics