API to help detect improper use before triggering rate limit

rvolgers · May 12, 2020, 10:07am

In my experience its pretty easy to hit the rate limit if you’re doing cloud (re-)deployments.

It is best of course to save the certificate generated for a cloud host to a persistent storage and restore that when a host is wiped and recreated. But when that is not implemented or not working correctly, the only way you will find out about it is when you are suddenly locked out.

It would be nice if there was a way such a deployment could detect that an unexpected amount of requests has been done for a given domain already. Tracking it locally is not an option, as the problem happens exactly when local state is not being preserved correctly.

So I’m thinking maybe an API that returns info about the progress toward various rate limits for a given domain. Then we can build a warning into deployments when we see we have done more requests than expected, with a low default that can be bumped up as needed for deployments where it is expected.

This does not have to be perfect. The important things are:

Stateless from the point of view of the client
Be able to detect when we are doing more requests than expected for a domain
Do it in the deployment itself. A cron job that polls crt.sh or an email notification is unlikely to reach the right person fast enough, and it makes it hard to tune warning limits per deployment, which is necessary to ensure the warnings are not ignored.

_az · May 12, 2020, 11:37am

You could use crt.sh as an API (e.g. https://crt.sh/?q=example.com&output=json) inside the critical path of your deployment tool, but the problem there is the lag that affects any log aggregator. If somebody is constantly re-running docker-compose up without a volume mount on their laptop, it’s not gonna save them.

Let’s Encrypt could make a non-standard extension in Boulder for this, I guess. The existing queries it already performs seem well suited to the job, all the required information is accessible (the window size, the limit granted to your ACME account and the actual count in the current window).

I feel like there’s a problem though, that ACME clients and servers somehow have to coordinate on an understanding of what the rate limits are. There is no interoperability between ACME implementations, and there’s not even a guarantee of interoperability between different releases of Boulder, because rate limits might get added, removed, or have their semantics slightly adjusted - like when the renewal exemption was added.

There’s also a problem that one operation (e.g. new order) is constrained by multiple rate limits that have different windows (new orders = 3 hours, certificates = 1 week), so I’m not sure how you would model that as an API query/response.

I’ve often thought about this problem (this was one experiment I crated in tracking rate limits as a third party), but nothing ever stood out to me as a very good solution, theoretically.

system · June 11, 2020, 11:38am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Detecting Current Usage to Prevent Rate Limiting Errors Help	4	1336	June 18, 2017
API endpoint to determine current rate-limit (remaining quota) Feature Requests	2	958	October 5, 2018
Hitting rate limit Server	6	3340	February 6, 2016
Better Notifications of when Users Approach Rate Limits Feature Requests	11	2902	October 28, 2017
API call to see account/domain limits Client dev	1	1737	December 29, 2015

API to help detect improper use *before* triggering rate limit

Related topics

API to help detect improper use before triggering rate limit