Rate Limits in SaaS application

I'm working on a currently private project that will allow users to point their domain to my webserver. After that a new vhost config is generated and users will be able to create a certificate for their website that is then provided by us.

Even knowing in this state it's too early to even think I'll ever hit the rate limit when generating certs, as a first measure I'm adding a cooldown of 7 days for changing the domain/generating a new cert.

If that at one point is not enough how would i tackle this issue, to avoid rate limits on my end?

There are multiple rate limits with multiple reasons to hit them ,depending on how you set up your SaaS application, so no proper advice can be given. Please see the Integration Guide - Let's Encrypt.

1 Like

We’re also discussing this kind of thing here currently

1 Like

When getting users to point their domain to your system, do your own checks to ensure it's resolving correctly to your system before attempting your certificate order, that way you will avoid failed authorization rate limits.

If cert renewals start to fail for a particular domain it's mostly likely they have moved to something else or they have let their domain expire, so retry a few times then fail permanently to avoid further failed authorizations.

As long as you're not batching cert requests for hundreds of domains per hour you should be able to scale without rate limits to at least 100,000 domains on one account.

3 Likes

I designed and open sourced our project for this. The public version is out of date, but I've gone though most of the steps and can give feedback.

Considering your deployment strategy uses vhosts - which do not scale well above 100s of domains - it does not sound like you are looking at scaling that large and should not have anything to worry about for a while.

There are 2 main ways of onboarding someone to a SAAS system:

1- Subscribers CNAME onto your domain, and that's it. HTTP-01 validation is used.
2- Subscribers additionally enter a TXT record for your domain to enable DNS-01 validation.

There are also 2 types of certificate serving:

1- only serve a configured certificate, even if expired
2- autoload certificates on demand, which means renew/procure if there is a missing/inactive certificate

I prefer requiring the subscriber to enter a TXT record if possible. You can have that TXT record point to an account on an acme-dns instance you manage, and then just utilize the acme-dns server to handle the ACME orders via DNS-01. This will give you a bit more control on your system, as you can more easily have a single system enable routing rules based on the success of a DNS-01 certificate procurement. That can be done with HTTP-01, but that is quite a bit more complex as I will note in a moment...

Short Answer:

This is extremely unlikely to create any discernible issues until you are dealing with thousands of domains. Even then, the issues are likely to be negligible until you are dealing with 10k+ domains. This is due to a mix of the ratelimits being generous (to this situation), built-in retries of most clients, and the current renewal period being about a month long. You don't need to over-engineer an enterprise ready solution for 7 day certificate lifetimes.

Long Answer:

One of the most common rate-limit you will hit is the "50 Certificates per Registered Domain" limit, using your subscriber's domain. They'll CNAME their subdomain onto yours, so the ratelimit applies to their registered domain - not yours.

The second common rate-limit is a nexus of "Duplicate Certificate", "New Orders", and "Pending Authorizations" when doing an auto-load, that will be caused by a dogpile effect that is not protected against. You need to ensure you are using a coordinated lock or semaphore across all the integrated systems so that you only request ONE certificate when needed across all servers/processes. This is where the HTTP-01 system can cause issues with traffic systems. If you do not have an active certificate for a domain, improperly designed systems will keep trying to generate a new certificate on every request - which can jump up all of the rate-limits above and wedge your account. Ideally you will use a dogpile lock or similar, wherein the first request will create a lock and start an acme order, and subsequent requests will pause on the lock (polling every 1s for a cert) OR return an invalid certificate. If you do have issues there, make sure your clients are set to deactivate pending authorizations when an order fails. That is a very common way to wedge accounts.

Even if you are using vhosts for your application, i do not advise on using them to handle SSL for a scalable multi-tenant/domain application. You should look into terminating SSL on a gateway server, and either having that server automatically handle certificates OR having your process load it via API.

On the simple and elegant side, the Caddy server can handle auto-ssl and coordination all by itself. You can just sit something like that in front of your application and pretty-much forget it exists.

On the needlessly complex side, my Peter SSLers approach is to use OpenResty (nginx fork) to create a hook for loading or ordering SSL certificates from a backend Python server - which manages existing certs and orders new ones from LE as needed - then loads the certs into multiple nginx and redis caches for performance.

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.