Bulk OCSP Requests

Hi,

I'm managing some servers with a lot of domains (50K) and SSL certificates and 90% of those certificates came from Lets Encrypt.

The software used is Openresty so it's a bit different than Nginx for the SSL / OCSP part, it's more dynamic. If you are not familiar with it :

I'm working on implementing the OCSP Stapling. My main issue is that I have 50K domains with differents SSL certificates.

I'm offloading the fetching of the OCSP Response to another service because I can't do it directly with OpenResty, I think it will cause trouble with this much domains and the servers are well used with a lot of incoming trafic.

My question is : Is there a way to make bulk OCSP Requests ? If possible, I want to retrieve the OCSP Response for XXX certificates with 1 HTTP request.

Right now, I'm testing retrieving the OCSP Response with 1 POST HTTP request per domain/certificate and it's kind of slow. I'm thinking about the worst case scenario : a cold reboot without any ocsp cache.

If a bulk request if not possible, is there a limit on the OCSP Responder of Lets encrypt ? Can I make concurent OCSP Request from 1 single IP ? Is there any rate limit / security in place that could block me ?

Thank you,
Alexis

2 Likes

Yes and no. RFC 6960 explicitly allows an OCSP request to include more than one certificate in the request list. Correspondingly, the response may contain more than one OCSPResponseStatus.

However, if I try to do this (sending multiple OCSP status requests in a single HTTP query to Let's Encrypts responder), I only get a single response for the first certificate in the request - the others are omitted. Reading through some information in the 'net, this seems to be rather common among OCSP responders - multiple responses seem to be rarely supported. So the answer for Let's Encrypt specifically seems to be "no, you can't.".

(I would guess that this has something to-do with the fact that OCSP responses are pre-signed and pre-cached, so allowing multiple certificates in a single request would kill pre-signing).

I'm not Let's Encrypt staff, so I can't give you a hard yes/no answer on that, but I can make some more-or-less educated guesses.

We should recall that OCSP is a very high throughput service (source). Let's Encrypt probably serves more OCSP responses than it issues certificates, so OCSP is optimized for throughput/speed.

Let's Encrypt uses the Akamai CDN for delivery of its OCSP responses. OCSP responses are cached as efficiently as possible, so I presume that most OCSP requests are directly answered by the CDN. Given the high capacity of a CDN, this should both give good availability and high throughput.

Considering this information, I believe that you can send a lot of OCSP requests in a short timeframe without worrying too much. To protect infrastructure, either Let's Encrypt or Akamai (or both) probably have rate-limiting to prevent abuse or DOS-style attacks - but I expect the limit to be in the magnitude of maybe a thousand requests/second (note that I haven't verified this, so I may be wrong).


Note that you can test pretty much all of this. Let's Encrypt allows for all kind of playtesting in their staging enviroment. The limits in staging are rather high, so using two dummy/throwaway/test domains you can obtain 50.000 test certificates in less than a week from staging. Let's Encrypt does offer OCSP for its staging certificates, same as for the production ones. Looking at the OCSP responder, it appears that staging OCSP uses the exact same Akamai configuration/responder as production.

But, given that you already have the certificates, you could probably also directly test in production, since you're only requesting OCSP - you could test just obtaining the OCSP responses on a test machine, without actually using the OCSP responses in production.

If you do manage to find out the limit at which you see something happening (Akamai refusing you, throttling etc) I think we would be interested to hear what you managed to achieve.

7 Likes

Perhaps you want to have an explicit OCSP response cache somewhere on your network (or on a cloud provider you're already using for other things), so that you're basically never starting "from scratch" once you get it initially populated?

5 Likes

For whatever it's worth, I found the spec for this: ISRG CPS section 7.3 says "ISRG OCSP responders implement the RFC 5019 profile of RFC 6960," and RFC 5019 says "OCSPRequests conformant to this profile MUST include only one Request in the OCSPRequest.RequestList structure." That is, RFC 5019 describes a subset of OCSP for "large scale (high volume)" PKI environments, and Let's Encrypt is certainly one of those. (This is also where the requirement to use SHA-1 for the requests comes from, even though some OCSP systems allow for other hashes.)

6 Likes

Thanks for the anwsers. I came across those RFC while researching and I didn't bookmarked them so I lost them. I wasn't able to re-find those, so thank you for the link/rfc, I'll take a look !

I'm using the mlocati/ocsp lib to get the OCSP Response. I'll have to tweak it a bit to see if I can send / retrieve a list of OCSP request/response. I already tweaked it by exposing the nextUpdate field from the response.

The OCSP will be cached with a multiple layer cache :

  • On a redis database (master/slave). I consider this to be my "slow" cache. I populate/manage this cache asynchronously with a PHP service.
  • The Openresty server will check the Redis cache and also store the response inside his own memory cache using thibaultcha/lua-resty-mlcache . It's my "fast" cache.
  • I also added a fallback on openresty directly, it'll be able to fetch the OCSP Response if it's not on Redis nor the internal cache. I'm not sure if I will use this, I will test this on my production server and see if it cause trouble (I'm afraid of creating some locking problem inside Openresty).

So my worst case scenario (cold restart) will not happen often, but I would be more confortable if I was able to generate those OCSP response under 5-10 minutes.

With one thread, without any optimizations (just a quick & dirty test) I was able to generate about 50K OCSP response in ~6 000 seconds. So with some multi-threading and optimizations on my side, I should be able to have something acceptable soon.

5 Likes

Is there a period OR a comma missing?

1 Like

We do have a rate limit in the range of 500-1,000 OCSP requests per second, which applies to requests that hit our origin servers. We serve 429 error responses for traffic that exceeds that rate limit, but we don't block clients for this unless the excessive traffic is extreme and/or ongoing.

Most OCSP queries do not hit our origin servers; our CDN has a cache rate of above 99%, and they have no rate limit that I know of. Certificates that are brand new, or are for sites that almost never get traffic, are less likely to be cached.

Generally, I think your use case sounds fine. Please do implement some kind of back-off in your client if it starts to receive 429 responses, though. That's a best practice; 429s are our preferred, gentle way of dealing with busy clients. The only times we go beyond the published rate limits (with 403s or with firewall blocks) are in emergency situations or when 429s aren't respected.

9 Likes

So to be clear: OCSP responses are pre-generated, even if not requested, at the origin servers, but aren't automatically pushed to the CDN to be pre-cached?

5 Likes

Yes, that's exactly right (for now). We're not able to pre-cache them because of the high volume. We do reach out and delete stale OCSP responses from the CDN's caches immediately if a certificate is revoked.

8 Likes

@rg305 6000 seconds so about 100 minutes. Sorry if I wasn't clear.
It was with just 1 proccess looping over the certificates, without optimization on my side.

@jamesLE Thank you for the confirmation about the rate-limit. It's high enough that it won't be a problem for me.

I will manage this by using multiple proccesses to request the OCSP response, like 10-15 proccess and I will be under 10 minutes for the 50K domains.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.