Possible new feature: paused ACME accounts

jsha · March 26, 2021, 7:15pm

Good point, I didn't elaborate on the problem statement! The main issue I'd like to solve is this: about 80% of HTTP-01 validations currently fail: currently approximate 110 rps of errors. That means we are spending a lot of resources (storage, bandwidth, CPU) on unneeded work. But also it means it's hard to evaluate when slowness in various validation systems (particularly DNS) are due to a problem on our end vs a particularly large influx of traffic from someone with failing validations.

I haven't done this calculation yet, but this is a good idea.

This is a good point, that having multiple levels of rate limit would solve this in another way. For instance, one can imagine adding to the "5 failed validations per hour" a "25 failed validations per week" and "50 failed validations per month".

A couple of reasons to prefer setting a bit on the account:

In a lot of cases, the client will really never succeed again and is completely forgotten. Spending resources on even 50 failed validations per month adds up, and doesn't benefit anyone.
If someone does notice that their account has been paused, they can unpause it right away rather than wait for the rate limit to expire.
Right now, calculating rate limits is somewhat expensive for us, but we do hope to improve on that.

Also keep in mind this is not quite the same as the failed validations limit: It's failed validations, combined with a long period of no successful issuance. It might also make sense to express the threshold over a long period, for instance 100 failed validations over the course of 90 days and no successful issuance in 180 days.

There are a couple of common cases. There are some that attempt renewal once every day. These are not a problem on their own, but when there are many of them, particularly if they are all using a stock VM image with a cron job set to a particular time of day, they can be noticeable.

Then there are some with buggy software that goes off the rails and hits us many times per second. Right now we block these when it gets bad enough, and we usually try to notify the maintainer of the software. But this is a very manual process. And some buggy clients make it hard to set an email (or discourage it by not showing how in the examples), so for some of the offenders we have no way to get in touch.

I don't. Though in those cases the hosting provider in theory is in charge of handling errors. There's definitely some nuance here in whether the hosting provider creates an account per-user or on account for all their users.

This is a good idea! We should do this too.

Topic		Replies	Views
Questions: Automatic Pausing of Zombie Clients Issuance Tech	19	449	January 30, 2025
Re: Enabling ACME CAA Account and Method Binding Praise	26	2222	January 17, 2023
Automatic Pausing of Zombie Clients API Announcements	2	865	December 5, 2024
Error creating new order :: too many currently pending authorizations Help	10	2878	December 23, 2019
In serious trouble: "too many currently pending authorizations" Help	14	1231	February 26, 2020

Possible new feature: paused ACME accounts

Related topics