Recommendations for how to operate as a big hoster

Hi, I am not sure if this is the right section for my questions, if not please feel free to move the topic to the proper section.

I am working for a big hoster (open-xchange.com) and we are currently thinking about allowing our customers to use Let's Encrypt directly from our web panel and also for some potential upcoming big migrations (For example we will need to enroll about ~500000 domains in 3 months).

I read the integration guide and the rate limit pages, and some of the posts in this same page and I have some questions about how to maximize the throughput of requests, so that a migration like the one above could succeed in time.

1.- The integration guide recommends to use a single account (and of course to not issue requests for new orders before they can succeed), I checked the form to increase the rate limit for the number of open orders and the maximum value seems to be 10000+ (I guess this means 10k + perhaps more based on the available resources on the let's encrypt side):
a) How hard would it to be to get one of those rate limit increases for our account? I am thinking about increasing the maximum of 300 new orders per 3 hours for example, but perhaps we will also need to raise the other limits.
b) Do we need to go over the threshold first before requesting a rate increase for any of the limits?
c) Would it be recommend to have more than one account/multiple rate limit increase requests for this use case? or is it better to only stick to one account?. I am aware that there is a rate limit per IP address for new accounts, but perhaps we could prepare a bunch of accounts and get them ready for that kind of migration.

2.- I read that you guys are implementing some sort of service on your side that would actually issue the requests to our systems when it is the time for the renewal of a certificate, Is there any current date for this service to go live?.

3.- In terms of software is there any recommendation for any particular existing software that could handle that number of certificates? We are trying to host the maximum number of services in Kubernetes, but I wonder if for example cert-manager would be able to deal with so many requests, or if we would need to write our own client.

Thanks in advance for your recommendations.

5 Likes

Hello @victorox,

welcome to the community forum. We're always happy to see integrators asking first (before they run into trouble), so it's nice to see having these questions asked beforehand. However, I do not personally have experience with deployments of your size, nor am I affiliated with Let's Encrypt, so I won't be able to answer all your questions. However, we do have staff and other experienced people around here, so maybe someone else can answer more of your questions.

Here's what I believe to know:

As far as I'm aware, raising the new orders limit is typical for larger deployments. You should probably start with that. While rate limit overrides do take a while to process, as far as I know most people find an agreement with Let's Encrypt. Therefore, I would suggest you just try it.

Not necessarily, but it would probably be helpful if you could give Let's Encrypt numbers about your typical or expected usage (in terms of orders per X). This is also one of the questions in the rate limit override form (in time units of per 3 hours).

As far as I know, Let's Encrypt manages rate limit overrides per-account (or per domain in some specific cases). Therefore, more accounts means more work for everyone involved. The rate limit form allows for at most 3 accounts per request. So if you want overrides, less accounts is generally preferred for Let's Encrypt.

I'm not sure what system you talk about. If you're talking about ACME Renewal Information (ARI):

ARI is an extension to the ACME protocol, where ACME clients can request a timeframe on when to renew certificates. Currently, it is recommended that all ACME clients renew certificates after 2/3 of the certificate's lifetime has elapsed. With ARI, the ACME server will be able to give the client more granular information ("renew this certificate between time X and time Y"). This is particularly useful in mass-revocation events, where all certificates have to be renewed early. However, your client will still have to call the ARI endpoint regularly and also perform the renewal as usual.

There is currently no announced date for ARI to move into production (a prototype is available in staging).


Once you've figured things out, we would also love it if you could report your experiences with deploying LE certificates on your systems. This would be valuable information to us we could pass to others undertaking similar adventures.

6 Likes

FYI - the rate limiting form can be found in the Overrides section of Rate Limits - Let's Encrypt

4 Likes

How does your hosting actually work? You mentioned Kubernetes but are all of your hosted services running through that? Do you intend to use http validation only or DNS?

cert-manager is certainly the first thing to look at but I don't believe we have many experts on that software in this forum.

One strategy to minimize renewals and maximize container flexibility is to use a secrets store (Hashicorp vault or any number of other options) and configure services to fetch the required secret (certificate files) on demand and periodically refresh. This gives you something that's easy to backup/recover, avoid things like certs being acquired on every container startup etc and removes the direct dependency between cert renewal management and cert deployment.

I develop Certify The Web which is a certificate management tool primarily used on Windows (with a Linux/container version in development). Very recently we've been looking at scaling with the aim of managing renewal for hundreds of thousands of certs per instance, so topics for that include:

  • renewal system scaling (CA dependencies, CA fallback, CA round-robin/balancing), coping with repeated or permanent failure of large numbers of domains (e.g. failure reporting and deciding if/when to give up trying). We have implemented ARI support as part of this.
  • cert/key storage and (controlled) retrieval
  • deployment (which can be fully disconnected from renewal processes)

If using Let's Encrypt it definitely sounds like you would benefit from preemptively raising your New Orders rate limit. Other ACME CAs exist and may or may not have rate limits (e.g. ZeroSSL, Google Trust Services, BuyPass Go) so balancing your orders across different CAs would seem to me to be a generally good idea for resilience and redundancy. Note also that if users control their own DNS then some may have CAA records restricting issuance or DNS issues preventing issuance.

7 Likes

At your scale, (500k domains, so probably 100k to 500k certs), I would guess most integrators are using custom software, though they're usually using a shared library and not writing an actual custom ACME client, but want some degree of control for orchestration: You probably have a large server farm at that point, with dozens or more servers, and such environments tend to have a lot of customization, and will want to integrate with some storage backend (whether that be something like Vault, your own database, or otherwise).

I don't have a lot of experience with cert-manager at that scale -- certainly people are using it for hundreds of certs, but I'm not sure about hundreds of thousands. You'll probably have to talk to the authors or other users of that project; I'm not aware of any that are regularly active on this forum.

You should start a rate limit adjustment discussion as soon as possible, as requests of this size may require a bit of back-and-forth, but the size you've suggested is well within the norm of Let's Encrypt users. Please do try to understand how the rate limits work, plan your usage, and include those details when completing the form.

Using a single account with rate limit increases is generally the right approach. That is the way we're best equipped to help you.

8 Likes

Thank you very much for all the answers, I will try to share the details of how we approach this in the end when we need to scale up our usage to those numbers (We are just in the planning phase for now so this can take easily a few months).

Since we will have some time to prepare for it, I am getting all the pieces of advice that you guys gave me and saving them internally as requirements in case we end up developing our own tool, which will very likely be the case.

Thanks again.

7 Likes

I am pretty sure there is/was a semi-official best-practices guide for large integrators that was developed by LetsEncrypt staff and some of the larger hosts. I can't seem to find it, maybe someone else can.

With the number of domains in your network, I know one element is wanting a phased onboarding over a period of time.

You have to calculate the overhead in memory and resources that switching to HTTPS will incur, and decide if it makes sense to shape/shard traffic, decrypt on a gateway, or do everything on each server.

If you can shield access to the Private Key from your clients, you can recycle a Private Key across domains/certificates. This is very useful in large scale deployments, because you eliminate a chunk of data the webserver must load. if your clients have access to the Private Key, you can not do that.

You also have to decide what the best way to obtain/store certs is. Perhaps you want to push all authorizations to a centralized server farm that will obtain the cert, serve the response, store the certificate and deploy within your network.

I opensourced an earlier version of our tool - GitHub - aptise/peter_sslers: or how i stopped worrying and learned to love the ssl certificate . It offers a SQL based centralized Client and Certificate Manager, and an nginx plugin to dynamically load certificates from an internal cache or upstream service. That might give you some inspiration.

8 Likes

The only doc I know of is the Integration Guide - Let's Encrypt which was referenced in the first post

5 Likes

Here is the discussion on the "Best Practices doc"

And here is that site on github (the domain is no longer live) GitHub - https-dev/docs: Documents about HTTPS development

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.