[Solved] OCSP server sometimes has malformed response of 5 bytes or "unauthorized"

I'm working on OCSP stapling using this Go package: ocsp package - golang.org/x/crypto/ocsp - Go Packages

Sometimes the response bytes from the OCSP server are: [48 3 10 1 6] or, in hex: []byte{0x30, 0x3, 0xa, 0x1, 0x6}. I see failures with both real certificates against the live OCSP endpoint and test certs against the staging endpoint. (But nearly all my tests are against staging, with the fake certs.)

The error message from the Go package is this, understandbly:

asn1: syntax error: sequence truncated

Anyway, sometimes reissuing the certificate fixes it, at least for a while, until I re-issue again. Any good ideas as to why this might be happening?

The inputs to the function that makes the OCSP request are the issued certificate and the issuer certificate, both of which I've made available here in a serialized form for both the valid and malformed response cases.

I've redacted my domain name, but the inputs are basically the same (except that the cert bytes are different because I had to re-issue to toggle the failure. Both certs are valid bytes, though, as verified in my browser and you can check yourself if you want.)

Since the inputs are effectively the same, I can't figure out why sometimes OCSP is malformed and other times it succeeds. The results tend to happen in batches of attempts.

Lately I’ve been seeing this about half the time now, can’t be sure why:

ocsp: error from server: unauthorized

What does that mean?

i’m getting this error, too. not sure if it happens since beginnings but it seems to be related to ocsp stapling which i have enabled, too.

ngrep output

########
T 78.46.100.157:58258 -> 195.138.255.49:80 [AP]
 POST / HTTP/1.0..Host: ocsp.int-x1.letsencrypt.org:80..Content-Type: application/ocsp-request..Content-Length: 85....
0S0Q0O0M0K0...+.........Wr.y|V...Y.u...LL.....Jjc.}....9..Ee.............p........+.H                                
##
T 195.138.255.49:80 -> 78.46.100.157:58258 [AP]
HTTP/1.0 200 OK..Server: nginx..Content-Type: application/ocsp-response..Content-Length: 5..Cache-Control: max-age=41
769..Expires: Fri, 12 Feb 2016 13:34:36 GMT..Date: Fri, 12 Feb 2016 01:58:27 GMT..Connection: close....0....         
 ####

thats for https://springrts.com

Consider how many clients hammer this poor little API endpoint. It’s a wonder that the API answers at all, sometimes. And it will only get worse, not better.

IMHO the only to deal with this (unsolvable?) problem is: Write a really tenacious OCSP client, with long retry pauses/timeouts, one that ignores everything but valid answers. Be happy if you get one, after a few hours or so. Staple the OCSP token to HTTPS requests as usual and start fetching a fresh one long before the old expires.

I guess most browsers (except Firefox?) have given up and treat timeout / errors as “not invalid” / “not revoke” answers and don’t complain. But they try and try again and hammer the API into DOS oblivion.

when the api is overloaded, why does it respond with unauthorized?

it should respond with 2 or 3 then:

https://www.rfc-editor.org/rfc/rfc2560.txt

internalError         (2),      --Internal error in issuer
tryLater              (3),      --Try again later

“The response “unauthorized” is returned in cases where the client is not authorized to make this query to this server.”

Thanks for reporting, @mholt! We’ll take a look. This shouldn’t be happening, and almost certainly isn’t related to an overloaded endpoint.

1 Like

I tried to reproduce using the examples you provided. I copied the Raw values from the two EE certs in your gist into a new Go file, and added logic to generate an OCSP request, fetch it, and parse the response. I get successes with both certificates. Do you reproduce the same behavior given the attached check-response.go? Can you still reproduce the problem with your original code?

check-response.go (19.7 KB)

I have also had this issue with OCSP unauthorized responses with my certificate on my home server using nginx approximately 5 hours ago. Gave it another crack just now and it is all working properly again

I have just had this on three of my four domains, which fetch the response “offline” via a cron job either every hour or every four hours, replacing the old file if successful (with “good” in the response). Until this batch of failures, it’s been running perfectly for five weeks.

  • domain 1 (hourly): failed seven times from 22:06 yesterday to 04:06, 05:06 and 06:06 succeeded.
  • domain 2 (hourly): failed at 04:09 only, 05:09 and 06:09 succeeded
  • domain 3 (4-hourly): failed at 00:12 and 04:12, manual run just succeeded
  • domain 4 (4-hourly): no problems recorded.

If this is just a blip on the endpoint, nothing to worry about here as my strategy is very resilient to downtime.

@dugite-code and @Troon: I believe the issues you ran into recently were caused by a brief misconfiguration this afternoon, now fixed.

@mholt: One possible cause of your issue: Right now, Boulder may take up to a second to sign the first OCSP response after signing a certificate. [Edit Dec 2019: Boulder now signs OCSP and writes it to the DB before returning the cert, but in typically-rare conditions of replication lag you might see similar symptoms]. If your code is fetching the OCSP response in that first second, it may get an unauthorized response, which is then cached by Akamai. There are a couple of fixes: In the short term, we’ll fix caching headers on unauthorized responses. In the long term, we plan to switch to the asynchronous certificate issuance specified in the ACME protocol, so that returning the certificate to the client will block on the first OCSP generation. As a workaround in the meantime, I would suggest waiting for a second or two after issuance before requesting the OCSP response for stapling, being tolerant to failures, and retrying periodically.

1 Like

Indeed, I get “good response” from both certificates, even after running it a hundred times.

You’re making a GET, I’m making a POST, but both are according to spec. Will keep looking into this…

Oh, just saw this reply.

Apparently, at the same time you were writing that, I was discovering the same thing on my side, about the “waiting for a second.” I tried issuing a bunch of certs and check OCSP in a loop (I know, I’m naughty, but it was in staging). My first tests slept 10 seconds between issuance and checking OCSP. No errors.

Sleeping 5 seconds, no errors.

Sleeping 1 second, no errors. (Probably would see some with more iterations, though.)

Sleeping 500ms or less, I get ‘unauthorized’ fairly often.

So you’re right, it seems the best thing to do is wait some time before stapling.

When I was finally able to reproduce the ‘unauthorized’ error, I had @xenolf try to reproduce it with the same program (modified from yours), and it worked fine for him, even though I kept getting ‘unauthorized’ over and over for the same cert, even 10 minutes later. I guess this is because Akamai is caching that result.

I’m still unclear why the error is ‘unauthorized’ but, then again, I don’t understand OCSP very well yet either :smile:

Thanks for your help! I’ll build some more redundancy logic into my code.

This is probably because he was hitting a different Akamai region that didn't have the result cached.

3 posts were split to a new topic: OCSP server returns unauthorized status