My team has a fairly large working set of certificates at any point in time. We want to know if any are revoked, so we could certainly pro-actively query OCSP, but we ALSO are interested in spreading out 'normal' renewals by using ARI to have Let's Encrypt guide when any given cert should be renewed. Can we solve 2 problems with 1 API call for each active certificate?
That is, could we rely on ARI alone to warn/indicate which certs are potentially revoked? I.e. if a certificate is to be revoked soon, would ARI reflect that in moving a renewal window forward in time? We would certainly ALSO be able to double-check if the cert is truly being revoked by following up a call to ARI with a subsequent OCSP query...
I'm one of the engineers who worked on the ARI integration here at Let's Encrypt. Thank you for reaching out, and we greatly appreciate your consideration of adopting ARI for your renewals!
Before diving into details about how we use ARI, I'd like to address a point you mentioned:
We would certainly ALSO be able to double-check if the cert is truly being revoked by following up a call to ARI with a subsequent OCSP query...
While checking OCSP is an excellent way to determine if a certificate has already been revoked, it won't tell you about an impending revocation.
Can we solve 2 problems with 1 API call for each active certificate?
Yes, you can indeed address both problems with a single API call. When you use the renewalInfo endpoint, we check if the certificate is currently revoked or affected by an ongoing incident (i.e., about to be revoked). If either of these is the case, we return a suggested renewal window in the past, advising immediate renewal. If not, we provide an ideal suggested renewal window in the future.
As a kind of added bonus, if you renew based on ARI's recommendations and include the ARI certificate identifier in the replaces field of your replacement order, we will exempt your order from all rate limits. This and more is discussed in a blog post I published a little while back: An Engineer’s Guide to Integrating ARI into Existing ACME Clients - Let's Encrypt
Edit: Wrote most of this before an ISRG response was posted.
An explicit intent of the ARI specification is to support your use case. From the spec:
For example, a CA could suggest that clients renew
prior to a mass-revocation event to mitigate the impact of the
revocation, or a CA could suggest that clients renew earlier than
they normally would to reduce the size of an upcoming mass-renewal
spike.
I do not think there has yet been an official commitment from LetsEncrypt/ISRG to support this use case or how it would be done – but considering they drafted the ARI specification, came up with this use case, and are pushing for it to become the industry norm – I think we can expect their best-attempt to leverage ARI for prior notification of revocation whenever possible and for them to make some sort of vague promise about how they will coordinate ARI info during mass revocations with the CA/B Forum requirements.
That being said: the way LetsEncrypt implements ARI, the payload["suggestedWindow"]["end"] is guaranteed to be in the past if the Certificate has already been revoked AND LetsEncrypt has officially recommended frequent checks against the ARI system - so this would be the best option as it definitely includes the information that an OSCP response would have and is highly likely to include future revocation data.
I also want to note this bit, which I have suggested be included in the ARI spec in the past: If you encounter an ARI window that is expired, it almost always means something went wrong - such as your system has a misconfiguration and is not utilizing ARI info correctly for renewal, or there was a revocation event. IMHO, I think any past expiration should be logged and audited, and the site reliability implications of using ARI like a "warning canary" on your integration should be enough of a motivation to adopt ARI asap.
Thanks for the information, beautifulentropy and jvanasco! Very helpful.
We'll definitely look deeper into integrating ARI into our workflows.
One follow-up question - when a renewal is based on ARI feedback and is "exempt from rate limits", does this apply to the Pending Authorizations limit?
That said, if you present a valid OCSP staple (valid status, signed, unexpired, etc.) to the client, the cert is still good, even after the cert has been revoked, until the end of the lifetime of that OCSP response. When using OCSP, a revoked cert can't really be revoked sooner than its last signed, valid OCSP response.
(There's CRLs, but anyone can make those and the only one that's official is the CA's if they provide one.)
Some browsers actively query OCSP responders. Like Firefox. So you can't generalise OCSP stapling to "using OCSP". Firefox also maximises the lifetime of the OCSP response, regardless of the nextUpdate value (if that's farther into the future that is), although it's still 10 days Without a nextUpdate, an OCSP response is just valid for 24 hours.
Also, when a certificate is revoked, a new OCSP response will be signed immediately (with CDN caches deleted et c.). And webservers can limit their OCSP response cache used for stapling. E.g., Apaches SSLStaplingResponseMaxAge can override the nextUpdate from the OCSP response itself and require a newly fetched OCSP response e.g. every hour. OCSP responses are cached in the Let's Encrypt CDN so that wouldn't matter too much I'd think on the infrastructure.. (Although every 24 hours would make more sense )
Yes, like fetching regular OCSP responses, as Certbot does? I did not read it as if it would entail OCSP stapling.
You can't modify any cached response, no.
Yes and there also will be some time between two subsequent ARI checks by an ACME client. Your point being? There will ALWAYS be some sort of delay with the current implementations, as all methods have a polling behaviour, not pushing. And with one delay smaller than the other.
Depends, mandatory OCSP stapling with max. lifetimes of e.g. 24 hours would probably do the trick. Better than:
anyway. Because 7 days is quite a long time to be valid.
But OCSP stapling probably can't be mandated easily and optional OCSP checking is indeed severely broken.
Maybe TLS1.4 can mandate OCSP stapling?
Also, lowering the maximum lifetime of OCSP responses would dramatically increase the resource requirements for a CA, so that should somehow also be addressed, if that kind of modifications to the OCSP requirements would take place. Maybe less resource-intense signing algorithms with a separate OCSP signing key..
Hm, true, must staple did cross my mind earlier, but somehow didn't think of it again in this regard. But if e.g. the BR would mandate setting the must staple flag, all would depend on the browser implementation.. So it would be easier to mandate it in the browsers than use a flag with a mandated setting. And as far as I know, the BR does restrict the CAs, but not the browsers..
Which was either on purpose, or due to an issue that should be investigated.
Google, Firefox and Safari all push automatic security updates to their clients to ensure certain CRL info is immediately respected - and does not fall victim to the (problematic) 10day validity window.
There are no external optics into any of these processes though. We don't know how or why select certificates/domains are included, and only a small subset are handled through this. This alternate mechanism does exist though.
Yeah, and this is something that's really bothersome for server operators, because any mainstream client can choose to distrust your certificate at any time. It makes it impractical to try to keep up on where your certificate is trusted. </rant>
Do you know if this is Mozilla's only technology leveraged against this type of thing? I've read that google and microsoft utilize their "safe browsing" systems for their security teams to not only deplatform problematic sites but push out select CRL information.
For Firefox, connections get checked against CRLite or OCSP (depending on config as they continue to roll out CRLite), then OneCRL, and then finally against Google SafeBrowsing.
Ah ha! So they're leveraging Google's channel. This makes sense now. The component I was thinking of above was this bit, not the CRL technologies you shared. Those are really impressive.