Can ARI send a notification with callback URL when a certificate is about to expire unexpectedly or outside the planned schedule?

Dongwang · October 14, 2024, 6:49pm

Hi, our team is evaluating whether we can handle certificate revocation cases automatically. We're excited about ARI, but all the exposed interfaces currently rely on polling.

Given that we may manage up to 640k certificates, periodically querying each domain isn't practical. Would it be possible to provide a callback API during the certificate creation process? This way, we could receive notifications for unexpected revocations directly for the relevant domains.

Osiris · October 14, 2024, 7:09pm

As far as I know, the ARI protocol is still in development, so maybe. There's also the Feature Requests category on the forum. While ARI strictly speaking is not a Let's Encrypt "only" feature, I believe your question is better suited in the Feature Requests so I've moved it there.

Maybe @aarongable can chip in with regard to this request I too have often wondered what the load implications would be if one were to poll many times a day many, MANY certificates.. While I don't have any numbers to back it up, my feeling would be that it isn't very economical to poll in such a manner.

Such a callback API would make sense to me. However, the implementation would be another issue, especially for ACME clients that aren't active all the time, but just that one time during a renewal attempt.

jvanasco · October 14, 2024, 7:33pm

A Feature Request or suggestion might better be made against the spec here - GitHub - aarongable/draft-acme-ari: Internet Draft for the Automated Certificate Management Environment (ACME) Renewal Information (ARI) Extension - as most people working on the spec are likely to see it there.

I like this idea, but I immediately wonder if the implications of supporting it might be detrimental to it being deployed. Aside from the technical logistics, this could require a lot of work to prevent it from being a purposeful - or accidental - DDOS vector or other type of misuse.

I do think something like this was suggested before, and the official response was something like "you should poll frequently and not care, we can take care of load via caching/etc if needed"

Osiris · October 14, 2024, 7:56pm

But what about the load of the ACME client?

Dongwang · October 14, 2024, 8:16pm

Thank you @Osiris and @jvanasco. I have also posted the same question in the link above. Polling puts significant strain on Let's Encrypt, and there is a risk that revocation information may not reach the client promptly. I believe implementing notifications would allow subscribers to receive revocation updates immediately, ensuring timely action. Additionally, this approach would help LE servers conserve substantial resources and energy.

petercooperjr · October 14, 2024, 8:47pm

It might just move the chokepoint of needing to make lots of requests from the ACME client side to the ACME server side. The ARI protocol was designed so that the CA could, if needed, publish a bunch of static files to a CDN, making serving responses relatively straightforward. If they needed to hit a bunch of HTTPS endpoints, around the world, as part of handling an incident, I could see that being more complicated. (Plus the need to handle the possible abusive cases.)

Certbot at least checks OCSP twice a day on its renewal schedule; checking ARI should basically work as a drop-in replacement doing roughly the same amount of load on both the client and server sides.

I'm not really objecting to the idea, I'm just not yet convinced that it really helps all that much.

jvanasco · October 14, 2024, 9:08pm

I'm not sure that would be the case once an outage happens. While previous mass revocations have been as little as a few thousand certs, they have also spiked into millions of certs. Pushing to large numbers of callback URIs within the ideal time constraints - which would be days before the revocation - can be more burdensome than responding to inbound requests that use caching. Hundreds or thousands of requests would need to be processed simultaneously, and one would have to deal with timeouts and retries.

I do like the idea and I hope LE staff explores it - or finds other ways to work with large providers - I am just not fully convinced of it's utility.

Dongwang · October 14, 2024, 9:26pm

With polling, your servers will consistently face pressure, as they need to handle ongoing requests regardless of whether there are changes. In contrast, with push notifications, revocations—being rare events—might cause short-term spikes in traffic, but for the majority of the time, there would be minimal or no traffic. This would lead to more efficient resource usage overall.

Osiris · October 14, 2024, 9:38pm

And perhaps better adoption of ARI for a certain kind of users (with lots and lots of certs).

For the small Certbot users of this world with just a "few" certs it would't matter much, but if you've got tens/hundreds of thousands of certs, well..

Dongwang · October 14, 2024, 10:10pm

Agree. Thank you.

jvanasco · October 14, 2024, 10:20pm

It will not be "polling vs pushing", polling will still need to be supported as most clients do not and will not have a persistent web capability to support the callbacks.

With polling, many requests can be handled in-memory and there are a lot of techniques for caching and sharding data. The requests are far more numerous, but they are light operations.

With pushing, the concern isn't a short term spike in traffic so much as a (likely necessary) capability to quickly scale out to handle a massive number of blocking requests. Even when doing this async, the blocking issues (timeouts, slow requests, dropped connections) will complicate batching that is usually needed for parallel operations like this.

I say this for a lot of things that a particular to large integrations and high-availability users, I really do think ISRG should consider a commercial tier of services that charge a small/reasonable amount for stuff like this. This type of feature is really most (only?) useful to a select number of clients. In the past, ISRG has indicated they do not want to offer commercial packages, and really do not like to develop features that won't be utilized by all subscribers. IMHO, implementing this for a small subset of paid users (vs a global rollout) might hit the sweet spot of making this technically possible for those who need it, and eliminating some of the technical work that would be needed for a global rollout.

webprofusion · October 15, 2024, 2:37am

By

I agree it would be great to avoid checking status for every cert a couple of times per day, but that's what ARI currently amounts to. If you are managing 640k certs and want to use ARI then polling their ARI status is currently the cost of doing business. If batch polling every 5 minutes and only checking each cert once per day you would require a little over 2000 checks per batch. Assuming you don't manage all your certs on one server/container it then depends on what scale out of cert management you have (for instance I have test systems that scale across hundreds of nodes and they each look after their own subset). So it's certainly an inconvenience, but not impractical.

If you imagine a worst case short-notice mass revocation of almost every cert, the ARI polling is going to be the least of the problems regarding load, so at scale considerations such as CA fallback are also important (and there are certain things to watch out for with ARI there too!).

A callback API would be great (e.g. a request to /.well-known/acme-ari-notification with the certID and account public key), but I don't think it will happen any time soon and it doesn't cover all uses of domains as many services are not webservers, unless the callback could be nominated to happen on an account specific domain (e.g., not just the certs domain)

rg305 · October 15, 2024, 12:03pm

This sounds like something one would have to "sign up for" and there also would need to be an agreed "specification" as to how CAs notify accounts with issues.

Possibly an opportunity for a "broker service" that would handle such required "specification" notifications and convert them to a menu of choices "email/sms/http(s)/voice mail/etc.".

webprofusion · October 16, 2024, 4:14am

Clever, so poll ARI status on their behalf and perform configured alerts/API calls etc. Stealing this idea!

mholt · October 16, 2024, 1:17pm

I would have a hard time trusting any such service... FWIW

rg305 · October 16, 2024, 1:24pm

Ideally it would NOT have the power to issue nor revoke certs - just inform the holders [via multiple (alternate) channels].

Hey SIRI, let me know when any of my certs expire or have to be renewed.

petercooperjr · October 16, 2024, 1:25pm

There are a lot of people who are relying on emails to renew, rather than having actual monitoring. Having an actual external monitoring service, that also checks ARI/OSCP, and notifies the main client that it should probably renew, seems like a useful service that could be added to other external uptime checks that any "production" site should be having anyway.

rg305 · October 16, 2024, 1:26pm

@petercooperjr, I'll take that as a vote of... sanity!
I'll live to think yet another day!

mholt · October 16, 2024, 1:29pm

...as long as they actually do notify properly & on time

I imagine "uptime monitoring" services will implement this anyway, but still, having an external service do that seems kinda pointless since it defeats the purpose of automation.

rg305 · October 16, 2024, 1:56pm

Automation is the frontline.
Notifications are only there as a peace of mind and only used when something has gone wrong.

So, it doesn't defeat it - it compliments it.

Choose:
A
or
B

hmm...
I choose both!

Topic		Replies	Views
Consider ARI "replaced" status in expiration-mailer Feature Requests	35	767	July 11, 2024
Ability for Automated Notification of Revocations Feature Requests	35	4541	April 19, 2020
OCSP or ARI + OCSP to check revocation status Issuance Tech	20	693	July 13, 2024
RenewalInfo endpoint Client dev	3	508	January 26, 2024
Implementing ARI / POST issue Client dev	20	1876	May 7, 2023

Can ARI send a notification with callback URL when a certificate is about to expire unexpectedly or outside the planned schedule?

Related topics