Help Understanding Rate-Limiting

Hi! I am not asking for specific help so I don't think it's useful to include the actual domain name in question since this is (hopefully) mostly a generic "how does this work" question.

My client is running cert-manager in a Kubernetes cluster, using a LetsEncrypt issuer (prod). Every workload that they deploy to their cluster creates an ingress with a TLS section that includes two hostnames: myapp.environment.domain.net and myapp.domain.net.

They recently ran into an issue with one new workload, where cert-manager returned the "Error creating new order :: too many certificates already issued for "domain.net" message. After reading docs my understanding is thus:

  • They are hitting the Certificates per Registered Domain (50 per week) limit. I confirmed this by using the lectl tool. It shows

Sorry, you can't issue any certificate, you already issued 50 certificates on last 7 days

  • This limit counts your root domain. So any cert they issue that ends in domain.net, no matter the subdomains, will count towards this limit. In the example ingress above they are effectively using up 2 hits to the limit when they submit this request. If every workload in the cluster has that same ingress configuration (with two hosts both ending in domain.net), that's 2 requests per ingress against that limit. So over the last 7 days they requested 50 certificates for variations of domain.net.
  • Renewals don't count against this limit.
  • If they hit this limit, renewals will still be processed, but new certs will not.
  • Renewals do count against the duplicate domains limit.
  • Wildcards can be used to issue certs and avoid the rate limit.

First and foremost: Is my understanding of matters correct?

And the followup: If they create a new request for *.domain.net, will that also hit the rate limit? Would that serve as an immediate workaround?

Thanks in advance for your help in understanding how this works.

1 Like

Hello @turkeyleg, welcome to the Let's Encrypt community. :slightly_smiling_face:

See Rate Limits - Let's Encrypt and Failed Validation Limit - Let's Encrypt

3 Likes

Thank you!

3 Likes

You are welcome @turkeyleg. Have a pleasant day! :slight_smile:

3 Likes

The situation you described suggests there may be an anti-pattern being utilized here. If these are short-lived ephemeral servers with unique ids for testing purposes, you should definitely be using a wildcard certificate and recycling the same exact certificate across servers. There are several ACME clients and web servers that can coordinate this using cloud storage or block storage. The same holds true for servers that respawn with recycled names.

If these are long-lived servers that is a different situation, but recycling the certificates across nodes is still probably the correct thing to do.

There are very very very few situations where dynamically spawned/allocated servers should obtain publicly trusted SSL Certificates.

5 Likes

Thank you for your response. I think this is a little different because this is Kubernetes. I can indeed use a wildcard cert and have every microservice mount that secret, etc., and it's the way forward that I will be suggesting.

However: I'm still trying to verify that my understanding of how this rate-limiting works is accurate. Did I get that right?

1 Like

Yes; You can't get any new certs issued [for non-renewals], once 50 certs have been issued in a week.
But you can always get an existing cert renewed.

Not sure how you count them as two...
To clarify:

  • If both names are on one cert, then it only counts as one [cert].
  • If the names are on separate certs, then it does count them separately [as two certs].
3 Likes

Your understanding is mostly correct.

"This limit counts your root domain." That is really the registered domain, which is either something you purchased from a registrar or something that you obtained from a domain on the Public Suffix List.

Wildcards can be used to get around ratelimits if your organization does not need to isolate the subdomains from each other, but ratelimits do apply to wildcards.

If there is a scalable system for on-demand processing, which sounds like what you described, the correct solution will almost always be to share a single wildcard certificate across all the domains. there are almost no situations where an ACME client should be running on an ingress. that is a common anti-pattern that often causes a dev/ops disaster when the rate-limits are reached and the deployment system needs to be rewritten.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.