This will be quick because I am way behind schedule today...
Not sure about question about "registered rights",
I meant "who registered the domains or has administration rights", and it seems these are mostly customer domains and you have a SAAS/PAAS system.
Yeah, Certbot is fine for that. I am a big fan of using separate --config-dir
s to have multiple "installations" using the same Certbot install. The issue you're going to run into here (and likely have already) is keeping track in your code of which config is responsible for which renewals, and on which server.
Anyways, there are a few things to talk about. IMHO, Certbot is not a good candidate for your system.
Certbot does not scale well into the hundreds of domains. While most scaling issues are concerned with how it (badly) tries to manage large Apache/Nginx integrations, the big issue of concern to you here is the "backing datastore" for renewals and certificate management is just a flat file structure – IIRC, it constantly re-parses all those config files and certs every time you invoke it (enrollment, daily renewal, etc).
I have a few concerns on how you're leveraging Certbot - what you described seems very delicate and prone to break. You should really consider using an ACME library OR forking a simpler client. You can also consider using a client that is built into the Load Balancer or web-servers and will "autocert" on demand - several now offer that, and even use local/cloud storage to consolidate certificates.
In any event, my suggestions would be this:
- On the LB, I would funnel all traffic from
./well-known/acme-challenge
to a single server running ACME clients. Having multiple ACME clients behind one system tends to create issues. Sharding traffic based on the domain can minimize this, but you really don't need to consider a second server with an ACME client until you have many thousands of domains.
2- If you keep using Certbot, look into the psutil
package to replace subprocess
. [psutil.Popen](https://psutil.readthedocs.io/en/latest/#psutil.Popen)
is a drop-in replacement for subprocess.Popen
, but you get all the process management and system management tools from the psutil package made available to you. That makes it a lot easier to handle management.
3- If you're dealing with DNS delegation, I STRONGLY suggest running your own instance of acme-dns and having your clients CNAME their acme-challenge onto either a pre-assigned/generated subdomain on that system OR a deterministic subdomain (e.g. client-example.com
-> com.client-example.acme-authz.your-domain.com
). While acme-dns uses UUIDS for their subdomains, the datastore can be manipulated to use subdomains and I wrote a simple tool for that.
4- You probably don't need to worry much about security concerns, because your ACME client will be running behind a LB and in an isolated container.
IMHO, I think the best option for you would be to consider forking a smaller ACME client and building that into your system. The acme-tiny client is incredibly small Python client (200 lines of code) and is a great starting point. There are also many Python libraries and utilities that handle the core certificate and acme operations. Building something custom into your existing management system might take 2-4 days, but then you'll largely be done (except when you need to update for ARI support which Certbot doesn't have yet, etc).
Having previously gone through what you're currently going through, I can say the ACME client logic is the easiest part of your effort. The largest effort is in the backend business logic and mapping that to the ACME client. Designing your own ACME client for an easier integration / api will significantly reduce the complexity of your work.
A few years ago, I open-sourced our ACME client, PeterSSLers that is designed for situations similar to yours. I have to finish backporting a lot of features from the production version to the public one (especially renewal logic) - but it should give you some ideas on architectural concerns for a scalable system like yours. That system was designed to dynamically load SSL Certificates into nginx (via OpenResty, via a companion plugin).