Soft rate limit

schoen · March 17, 2021, 11:16pm

I can't find the earlier forum thread about this (and I've searched for it), but I remember that someone had proposed that there should be a more aggressive duplicate certificates limit with a 1-hour or 1-day reset, so that people would hit it much sooner when doing a wasteful reissuance (and hopefully thereby find out about the rate limits much sooner and with less overall frustration).

As I continue to notice the pattern of "I created my certificates inside a container/ephemeral VPS and deleted them over and over again, and now I'm rate limited" as a question on the forum lately, I would love to see a way to make people aware of this sooner and with slightly smaller consequences.

For example, perhaps there could be a 3 duplicate certificates per hour limit, 4 per day, and 5 per week, with the most restrictive relevant limit applying at any time. Then the people showing up on the forum asking about rate limits could more often be told "you'll need to save your certificates in persistent storage, and you'll have to wait an hour before trying again" rather than "you'll need to save your certificates in persistent storage, and you'll have to wait a week before trying again". They could then use that unexpected hour of downtime to research how to save their certificates in persistent storage.

griffin · March 17, 2021, 11:32pm

You mean this one?

I closed the issue because it seemed like it wasn't going anywhere, but I'm happy to reopen it.

schoen · March 18, 2021, 2:09am

Yep, that one!

I feel like the frequency of people hitting rate limits because of ephemeral instances has been increasing. So, if my perception is right, I feel like this proposal is increasing in relevance with time.

webprofusion · March 18, 2021, 4:32am

I'm perhaps wandering off topic but I've long felt that ephemeral instances (and many more permanent internal systems) shouldn't really be acquiring certs from the CA directly (especially if DNS validation is involved) and instead should be going via centralised cert management so that fancy validation, stored credentials and issuance controls can be centrally managed - there are of course several systems that do this already but I've no idea what the uptake is.

I'm currently adding a centralised service for Certify The Web (currently on docker/linux or windows) so that authorised app/service instances can pull their latest cert via an API and the cert service takes care of keeping them fresh. It's certainly not a new idea but I think as a strategy it could benefit from increased usability and it removes the issue of individual instances struggling to maintain their certs.

schoen · March 18, 2021, 4:38am

Yes, I think that would help a lot!

I've mentioned another idea a few times that I've had kicking around, that there should be an HTTP header to warn about "APIs you shouldn't use directly in an ephemeral instance" (I was calling it Was-Expensive, although I haven't written up a spec). In that case if ephemeral instances could set some kind of environment variable to indicate that they are ephemeral, their HTTP libraries could maybe start generating warnings about this... or something?

Cool, that's great!

There are some older pre-ACME protocols that I think are oriented around this kind of use case.

I wonder if any of them would be useful for this today, or if it makes more sense for most of these users to have an ACME proxy, or just a sort of trivial download from a known location.

webprofusion · March 18, 2021, 4:59am

Thanks! In the first instance I'm going for trivial download from a known API using pre-shared app/service specific credentials/tokens (likely with an option for mutual tls if API requests could happen over the public internet).

I thought about an ACME proxy and actually did build a working prototype about a year ago but I couldn't see a way around who should control the private key (unless it's pre-shared again). ACME doesn't pass private keys around but acquiring the original cert needs it for the CSR etc.

Pulling latest via an API is super simple and fast enough to achieve during app/service startup if the cert has been pre-prepared. Clients can pull from the API or a secrets store/vault that the central service has already published to. Some CTW users already publish their certs to Azure Key Vault or Hashicorp Vault etc via Deployment Tasks, so we'll extend that as well because that's generally very easy to do and a pretty good separation of concerns.

griffin · March 18, 2021, 7:29am

Issue against Boulder has been reopened.

schoen · April 5, 2021, 7:14pm

Another example where this would have been useful, following the "never knew there was any limit!" pattern.

jvanasco · April 5, 2021, 8:36pm

I opens-ourced our API Driven ACME Client/Manager a while back -- Peter SSLers. Our own use-case is to support an unknown number of domains, running on an unknown number of servers, in an unknown number of locations. To accomplish that, I wrote a tiered caching system for OpenResty (nginx variant) that loads certs during the SSL Handshake from worker-mem, shared-mem, redis, and finally an API server.

My gut reaction since day 1 has been that, while this is the right approach, the people who need these systems require quite a bit of customization - but don't have the resources for it. So they just default to "bad behaviors".

I've gotten a handful of private emails from companies wanting a specific enterprise feature built in, but they're never interested in contributing a PR or funding development of the feature. Based on some exchanges, their rationals are generally because of budget restraints ("we can't spend money on this!") and sprintable hour constraints ("we already have to allocate x hours for integration/management, we can't allocate y hours for development").

webprofusion · April 6, 2021, 1:52am

Thanks for your insights Jonathan, your project is incredibly sophisticated, I think I'm pitching at a simpler level overall - nice work!

jvanasco · April 6, 2021, 3:34pm

Thanks! We are definitely aimed at completely different use cases -- my project is aimed at simplifying "internet scale" deployments like PAAS, SAAS, Whitelabel Tools, etc and programmatic usage. The UX is an afterthought, for bugfixing. It is overkill for 99.9% of use cases, which is why I even use Certbot for our own certs.

Your project, in contrast, has amazing UX and is simple and enjoyable to use.

schoen · April 7, 2021, 12:02am

So, it looks to me like in Boulder you can only have active one rate limit of each kind.

For example, in

github.com

letsencrypt/boulder/blob/366632281787625cc8d8ca03a277b67c6cfc4ed5/test/rate-limit-policies.yml

# See cmd/shell.go for definitions of these rate limits.
certificatesPerName:
  window: 2160h
  threshold: 2
  overrides:
    ratelimit.me: 1
    lim.it: 0
    # Hostnames used by the letsencrypt client integration test.
    le.wtf: 10000
    le1.wtf: 10000
    le2.wtf: 10000
    le3.wtf: 10000
    nginx.wtf: 10000
    good-caa-reserved.com: 10000
    bad-caa-reserved.com: 10000
    ecdsa.le.wtf: 10000
    must-staple.le.wtf: 10000
  registrationOverrides:
    101: 1000
registrationsPerIP:

This file has been truncated. show original

you could change the details of certificatesPerFQDNSet, but you couldn't add a second certificatesPerFQDNSet with different details.

So the easy way to implement @griffin's original idea would be to create new kinds of rate limits, like certificatesPerFQDNSetLarge, certificatesPerFQDNSetMedium, and certificatesPerFQDNSetSmall, or something, and update all of the code that refers to rate limit types in boulder/ratelimit and boulder/ra, so that all three of them can be checked. But this might be less elegant than making the rate limits allow some kind of multiplicity of a rate limit policy, which I don't think the current code can handle.

petercooperjr · April 8, 2021, 9:39pm

I have to say I'm kind of curious, if without changing code but just changing parameters one completely replaced the 5-per-week limit and changed it to a 1-per-hour limit, if it might still have a net effect of reducing duplicates and load on Let's Encrypt's servers.

Might be fun to experiment with, but of course that's easy for me to say when I'm not the one running the servers.

griffin · April 8, 2021, 10:03pm

I like it, @petercooperjr!

How about it, @jsha?

petercooperjr · April 8, 2021, 10:10pm

Of course, something like 2-per-4-hours or even 3-per-day might work better. I just don't know if the 5-per-week is based on some actual evidence from early in Let's Encrypt history, or experience with other CAs, or based on known limits in their signing capacity, or if it was just a wild guess based on what they thought would help conserve their resources best. That's why I suggested it might be fun to experiment with, though I would certainly understand hesitation to do so in production. But even if a second level of rate limit were added, it's not clear to me what the right level would be to set it at beyond an intuition of "somewhere around 1 or 2 in the span of an hour or two".

jsha · April 8, 2021, 10:18pm

I can confirm 5-per-week was pulled out of a hat.

That said, if we went to a 1-per-hour limit, there are lots of clients that would go from issuance 5 duplicate certificates a week to issuing 168 certificates a week.

The main load from duplicates is not beginners trying a handful of times and not realizing they're using up resources; it's misconfigured servers that re-request indefinitely for years at a time.

griffin · April 8, 2021, 10:34pm

Makes sense. Based on @schoen's analysis above, it looks like there might be some level of effort involved with implementing an hourly limit. I still believe it would probably be worth it though from a load-reduction standpoint by slowing down the less informed and thwarting bad practices with ephemeral instances.

edit: I am meaning to have a dual limit (one to three per hour and five per week).

schoen · April 8, 2021, 10:36pm

I think the best course is to have both—5 per week for bots, 3 per hour for humans. (Maybe 4 per day for persistent humans.)

petercooperjr · April 9, 2021, 7:51pm

I figured there might be be some, but are there really a lot that request continually, rather than just twice-a-day when they try to renew or whatever? Yikes. I guess we do need more rate limits then, rather then just tweaking the one we have. (Though even with a "short" limit in place, it might be worth looking at other options for what the reset of the "long" one should be, if 5-per-week was just pulled from a hat. And maybe even make some more limits to help stop the request-a-new-cert-every-chance-they-get clients.)

system · May 9, 2021, 7:51pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Notification before rate-limit reached Feature Requests	48	2517	November 30, 2020
Add an Hourly Duplicate Certificate Rate Limit Feature Requests	11	1717	January 21, 2021
Public beta rate limits Issuance Tech	131	63161	December 22, 2016
Switching the “Duplicate Certificate limit” to an even number Feature Requests	37	3432	March 15, 2022
New rate limit question	22	8687	April 19, 2016

Soft rate limit

Related topics