Sometimes the response bytes from the OCSP server are: [48 3 10 1 6] or, in hex: []byte{0x30, 0x3, 0xa, 0x1, 0x6}. I see failures with both real certificates against the live OCSP endpoint and test certs against the staging endpoint. (But nearly all my tests are against staging, with the fake certs.)
The error message from the Go package is this, understandbly:
asn1: syntax error: sequence truncated
Anyway, sometimes reissuing the certificate fixes it, at least for a while, until I re-issue again. Any good ideas as to why this might be happening?
The inputs to the function that makes the OCSP request are the issued certificate and the issuer certificate, both of which I've made available here in a serialized form for both the valid and malformed response cases.
I've redacted my domain name, but the inputs are basically the same (except that the cert bytes are different because I had to re-issue to toggle the failure. Both certs are valid bytes, though, as verified in my browser and you can check yourself if you want.)
Since the inputs are effectively the same, I can't figure out why sometimes OCSP is malformed and other times it succeeds. The results tend to happen in batches of attempts.
Consider how many clients hammer this poor little API endpoint. Itās a wonder that the API answers at all, sometimes. And it will only get worse, not better.
IMHO the only to deal with this (unsolvable?) problem is: Write a really tenacious OCSP client, with long retry pauses/timeouts, one that ignores everything but valid answers. Be happy if you get one, after a few hours or so. Staple the OCSP token to HTTPS requests as usual and start fetching a fresh one long before the old expires.
I guess most browsers (except Firefox?) have given up and treat timeout / errors as ānot invalidā / ānot revokeā answers and donāt complain. But they try and try again and hammer the API into DOS oblivion.
I tried to reproduce using the examples you provided. I copied the Raw values from the two EE certs in your gist into a new Go file, and added logic to generate an OCSP request, fetch it, and parse the response. I get successes with both certificates. Do you reproduce the same behavior given the attached check-response.go? Can you still reproduce the problem with your original code?
I have also had this issue with OCSP unauthorized responses with my certificate on my home server using nginx approximately 5 hours ago. Gave it another crack just now and it is all working properly again
I have just had this on three of my four domains, which fetch the response āofflineā via a cron job either every hour or every four hours, replacing the old file if successful (with āgoodā in the response). Until this batch of failures, itās been running perfectly for five weeks.
domain 1 (hourly): failed seven times from 22:06 yesterday to 04:06, 05:06 and 06:06 succeeded.
domain 2 (hourly): failed at 04:09 only, 05:09 and 06:09 succeeded
domain 3 (4-hourly): failed at 00:12 and 04:12, manual run just succeeded
domain 4 (4-hourly): no problems recorded.
If this is just a blip on the endpoint, nothing to worry about here as my strategy is very resilient to downtime.
@dugite-code and @Troon: I believe the issues you ran into recently were caused by a brief misconfiguration this afternoon, now fixed.
@mholt: One possible cause of your issue: Right now, Boulder may take up to a second to sign the first OCSP response after signing a certificate. [Edit Dec 2019: Boulder now signs OCSP and writes it to the DB before returning the cert, but in typically-rare conditions of replication lag you might see similar symptoms]. If your code is fetching the OCSP response in that first second, it may get an unauthorized response, which is then cached by Akamai. There are a couple of fixes: In the short term, weāll fix caching headers on unauthorized responses. In the long term, we plan to switch to the asynchronous certificate issuance specified in the ACME protocol, so that returning the certificate to the client will block on the first OCSP generation. As a workaround in the meantime, I would suggest waiting for a second or two after issuance before requesting the OCSP response for stapling, being tolerant to failures, and retrying periodically.
Apparently, at the same time you were writing that, I was discovering the same thing on my side, about the āwaiting for a second.ā I tried issuing a bunch of certs and check OCSP in a loop (I know, Iām naughty, but it was in staging). My first tests slept 10 seconds between issuance and checking OCSP. No errors.
Sleeping 5 seconds, no errors.
Sleeping 1 second, no errors. (Probably would see some with more iterations, though.)
Sleeping 500ms or less, I get āunauthorizedā fairly often.
So youāre right, it seems the best thing to do is wait some time before stapling.
When I was finally able to reproduce the āunauthorizedā error, I had @xenolf try to reproduce it with the same program (modified from yours), and it worked fine for him, even though I kept getting āunauthorizedā over and over for the same cert, even 10 minutes later. I guess this is because Akamai is caching that result.
Iām still unclear why the error is āunauthorizedā but, then again, I donāt understand OCSP very well yet either
Thanks for your help! Iāll build some more redundancy logic into my code.