Hi
Yestarday 1 July 2018 we were having issues fetching OCSP responses that were coming from our aws machines. When same command was executed from our local network the OCSP response would arrive without any issues. This happened for several subdomains under *.luminatesec.com. We were unable to understand why that was happening, and it seemed to resolve by itself after few hours. Is it possible to know if it was something that we did that caused this unauthorized response.
Try adding the -header Host="ocsp.int-x3.letsencrypt.org" option to the ocsp command. (Or with a space instead of the = symbol with OpenSSL 1.0.2 or older.)
It’s happened in the past, and usually had something to do with the Akamai CDN used by Let’s Encrypt.
It’s best to be tolerant of OCSP outages in your application.
I think you can probably tag someone like cpu or isk if you want them to look into it, but you’ll probably need to bring some extra information like source/destination IPs and exact times.
We are adding logic to handle ocps outages. However we would like to know why it happened because its a production system and we need to know why the outage happened if its possible.
It has happened at “July 1st 2018, 19:01:46.466” GMT time zone.
Source IP: 54.245.27.250 (almost certain thats the source ip)
@cpu@isk would love to hear your input on this.
Thanks in advance!
@cpu
I think I found the issue.
My code was doing a GET request to OCSP endpoint ocsp.int-x3.letsencrypt.org in the following fashion: http://ocsp.int-x3.letsencrypt.org/<base64encode(OCSP_REQUEST>). I then sniffed with wireshark and saw that openssl performs a post to ocsp.int-x3.letsencrypt.org with binary OCSP_REQUEST in the body.
The weird thing is that the GET request is working fine with the boulder local test environment but stopped working with the production endpoint.
Any idea if both should be working or only the POST?
I'm not positive off-hand if this was the cause for your specific case but it sounds like a likely first guess. RFC 6960 indicates:
If HTTP caching is not important or if the request is greater than 255 bytes, the request SHOULD be submitted using POST.
That's only a SHOULD and not a MUST. When our operations team has identified your failing requests it will likely be easier to say conclusively if this was/wasn't the cause.
Can you please verify the source IP your requests are coming from, and tell me what IP address(es) your system is/are resolving for ocsp.int-x3.letsencrypt.org? We’ll need that info in order to search for your sessions and troubleshoot the exact cause.
Also, could you share the contents of issuer2.pem and oktatest.luminatesec.com.pem, from both your local machine and your AWS machine? And the full output from the openssl command on both machines?
I retrieved Akamai’s logs for those client IPs, edge server IPs, and time windows. I see some successful requests, but no failed requests: they were not logged. I’ll be curious to hear what you find from the other troubleshooting tools. If you’re able to take a packet capture (e.g. with tcpdump) when this is happening, that would be great, and would give us the data we’ll need to engage Akamai engineering if needed.