Thoughts from starting to play with ARI

mholt · June 19, 2023, 4:25am

Oh, your post strongly resonates with me!

I have the same thoughts. If ARI is intended to out-live revocation (and thus OCSP), I don't think it makes sense to model ARI after OCSP to maintain some sort of elegant parity, because once OCSP is gone, well now you just have a really complex algorithm just to make an HTTP request.

I don't understand the technical motivation either -- I'm sure @aarongable has some insights -- but if it's primarily elegance, I would like to file a feature request: easier construction of the URI.

I struggled with this when I implemented ARI because my ACME client, ACMEz, does not do OCSP in and of itself: the package is purely an ACME implementation. However, a level higher in CertMagic (the package that does maintenance and renewals using ACMEz), does implement OCSP stapling. Indeed, I was a little frustrated at having to implement OCSP logic at multiple layers of abstraction.

Also -- and this is a Go-specific thing -- very special, error-prone OCSP code is unexported in Go's ocsp package, so I had to copy it out and modify it, which was a bit tedious. (I opened an issue to request it be exported, but I doubt that'll happen.)

I noticed this too but didn't say anything, so I'm glad you brought this up.

Aha, I've also wondered about this. I am certain we will see certificate monitoring services (e.g. CertSpotter) also scraping and monitoring ARI. We will 100% for sure have third-parties learning about ARI updates probably faster than the relevant ARI-conforming ACME clients.

This touches on some points I still have confusion about. I raised concerns in a previous topic regarding rate limits -- which are still concerns -- but more fundamental questions remain:

If we think the certificate is going to be revoked in the future, why continue to trust it? i.e. if there was a misissuance or a key was compromised, we should stop trusting it now, not later. (I've heard all the "well it's just policy most of the time, not a security concern" arguments -- but I'm not convinced, since the policies have to be enforced to maintain security.) Basically ARI becomes another form of revocation!
If we think the CA is going to experience congestion soon, then why wait to renew? A narrower renewal window lowers our chances of getting a certificate than does trying right away with the same well-mannered exponential backoff (that I assume most clients aren't doing anyway because they're cron jobs) -- especially at a time when we know the CA is expecting higher loads.

The point is, if the renewal window changes, something is wrong and for optimal reliability and security, renew now.

So, if my ACME clients do support ARI, we'll probably try renewing right away if we see the renewal window move.

I am also curious how many times ARI will be used (i.e. change the renewal window) for:

congestion
revocation
something else -- are there any other reasons?

Right now revocation leads 1-0.

And yeah, it will be interesting to see what transparency monitors do with ARI stats.

Overall, I will add my experience to yours: I found that implementing the basic ARI client code is not particularly pleasant; implementing ACME itself was just about a similar amount of complexity (in terms of constructing API calls) but with ACME the reward is much more significant. With ARI, it felt a little anti-climatic.

I like the idea of a way to know "you should renew your certificate now," but done differently.

Part of the reason this is complicated is because revocation is already broken. If we had short-lived certs, we likely wouldn't need ARI. Congestion would be a given (no matter what), and revocation wouldn't be useful.

My ARI wishlist:

An authenticated endpoint. This prevents clients and transparency monitors from using it as a signal for revocation; i.e. another form of revocation. It keeps ARI true to its purpose: to tell the ACME client when to renew, and to get a signal from the client when the cert has been replaced.
An easier way to get ARI info. Mentioned above already. It's too hard to craft the request, which isn't even authenticated.
Easier for clients to scale with reduced network traffic. Two API calls per certificate is tedious and noisy. Some clients of ours manage tens of thousands of certificates. I'd much rather see a single endpoint that lists certificates with renewal windows that have changed, and a batch method for clients to update ARI status for their certificates: maybe a JSON array with all the cert IDs / serial nos. in a single request. To keep things lightweight for the CA, this could be a static JSON doc that's updated every few minutes or hour.
Renewal window change should basically be "renew ASAP." For reasons mentioned above, it doesn't make much sense to say "there is or soon will be a problem with this cert" and continue serving a certificate that (a) isn't likely to renew successfully at first, or (b) already has reason to be distrusted. I think any certs appearing in the list at the ARI endpoint should be considered "at risk" and replaced right away. That way we're not serving certs that have a known compliance or security issue.