The way to call ACME procedure from python

Hello,
we have quite robust system written in python which uses certbot to issue and renew SSL certificates. When we planned this we were thinking about possible clients and we agreed the best will be to use certbot and call it from python using "process = Popen(call, stdout=PIPE, stderr=STDOUT)" where the call is the certbot command.

It's been working just fine, but yesterday one of forum member here mentioned another ways how to start ACME process in more programmatic way using python libraries.

Our target is to have as secure and reliable solution as possible, that is why we use certbot (as the application, not python library) because that, as far as I understood, is the most used and supported way to use ACMEv2.
Is there somebody who can give some comment to this? Would you be not affraid to, for example, integrate certbot as python module or even use ACME library in python?
I understand advantages having this integrated into python process, but we can't afford any surprises..

Thank you for any feedback,
Zdenek

3 Likes

When using certbot programmatically via Popen (or anything else that calls the CLI in a sub-process), the robustness of your system is tied directly to the specific formatting of the CLI output.

But there are no guarantees that output doesn't change. The output is meant for humans to read, not for programs to parse with string matching and regex. And it's not necessarily going to be in a changelog when something like that does change. So realistically, your system is going to break unexpectedly at some point.

Programs should use libraries or other programs whose output is intended to be consumed by other programs. The interface is a sort of contract. When method names or parameters change, they'll generally be associated with the appropriate semantic version change and a changelog you can notice and incorporate into your calling application.

7 Likes

I'm not really sure what you're asking specifically.

Because initially you say:

So you already seem to have a working system, integrating Certbot with your Python system.

But then:

How is this different than the Certbot which is already incorporated with your existing Python system?

While I still believe using libraries is the better option to integrate things, the Certbot API and acme library API lack documentation. So perhaps this specific case would be a case of "never change a winning team" (where "winning" is relative :roll_eyes:). Thus I'm curious what the difference is between the existing integration and what you're proposing for the new situation. (I.e.: more details.)

3 Likes

Yes this is working system and the only way we operate with certbot is through Popen (basically calling certbot command). We don't need to do any deep analysis of certbot output, if signing fails we notice it in further part of the code. Plus we work with certbot logs which tells us more details (like waiting for CA too long etc..)
It's been operating for few months and we noticed no issues. But one never knows what's the best path, you know :slight_smile: I was wondering if I find more people having the same usecase nowadays when tech companies focus more to SSL cert automation.

1 Like

And with regard to adding the TXT RR into the DNS zone, what method do you have for that, assuming you'd have the required value at hand?

2 Likes

My DNS colleagues use "EfficientIP" on top of BIND and that provides API to add the TXT record. So according to your suggestion in the previous thread. I will probably use small script triggered by --manual-auth-hook. Script will read env. variable keeping TXT record and send it via API call to our orchestrator which then calls EfficientIP (certbot runs at small separate server with limited access to the infra)

1 Like

Sounds like a plan.

3 Likes

I wish the Certbot internal API was a public package.

1 Like

Can you elaborate on how your system works and why you are invoking this from Python?

  • How often are you onboarding new domains? How many are you onboarding and managing?
  • Who owns (registered + admin rights) these domains (you? your customers?)
  • Why/how is this being invoked from Python ?
  • What is your network topology? Is this just one machine, or are there load balancers in front of multiple nodes? What is running Certbot and why?

The big concerns for me on your system:

  • Certbot manages it's own datastore and system (via text files) to remember what certificates it has, and how to renew.
    • There is complexity ensuring parity between your system and Certbot's
    • Certbot's model is designed for one server, not a cluster or scalable number. If you need to automate things and build into your Python setup, your network is probably configured in a way that Certbot is not a good candidate.
  • The ACME protocol is fairly simple and the smallest amount of most clients' codebase. Every ACME client has their own specific core focus of development.
    • Certbot's core strength is in manipulating/configuring web servers (apache and nginx specifically) with a focus on non-devops Subscribers [anyone who owns/rents a linux box, vs dev-ops professionals]. The bulk of their engineering efforts over the past few years have focused on that, but they also focus on small web systems (less than dozens of virtual hosts) above large numbers of virtual hosts (hundreds of virtual hosts will cause issues). For operations like certonly other clients are often better for a variety of reasons - as they may be simpler and scale better within those systems.

Certbot might very well be the best Client for your needs. It might also be the worst. If you can share a bit about how you are using Certbot, not just want you want it to do, we may be able to suggest alternate options.

7 Likes

Parts of it are.

Certbot's internal API is designed around the ACME package, which is maintained within Certbot's Github Repository, but available separately on PyPi. Their jose implementation, josepy is maintained in a dedicated josepy repository and package.

certbot's code manages the backing datastore (e.g. /etc/letsencrypt, or whatever you set --config-dir to), and integrates that with an ACME client that wraps the acme package, and their various plugins to manage server configurations.

There are a handful of other ACME clients and libraries available on PyPi as well.

While EFF hasn't offered an API for Certbot's internals that would allow you to automate or manipulate that installation (there is little reason to, as subprocesses/etc would work fine if this is your actual goal), they have released the building blocks that allow for other clients to be built upon the same API they utilize.

6 Likes

Hello and thanks for good questions :slight_smile:

We call certbot using certonly + non-interactive. Certbot gets "--webroot" (for example /var/acme/verifications) where it generates the "token" and then we have Flask which replies the content when it is requested by CA. For the DNS verification it gets bit more complicated but using that "auth-hook" we will solve it.

Overall our system consist of 2 app servers running python, then 2 web frontends and 2 DB servers, quite standard setup. We focus on automation of the infra so LBs, DNS, FW etc that is why we use python. And for ACME we built 2 small additional servers running python with flask (API endpoints) and the python calls certbot. We chose certbot because it was failry easy for us to use it by just calling it and let it do the job. I admit using python library would be better for its integration to our system.
ACME servers are in active/standy mode (behind LB).

Our scope is something between few hundreds to 1k certificates which needs to be maintained and few certs daily being added (my guess). That is the final state, we are now at the beginning having few tens of certificates. We sign certs by LetsEncrypt but also by other CAs supporting ACMEv2, so far with HTTP challenge, soon with DNS when we connect "top layer python" with DNS servers.

Not sure about question about "registered rights", but the domains belong mostly to our customers, we are supposed to keep certificates up to date and install them to infrastructure (servers, LBs, WAFs).

certbot datastore - we use "--config-dir" for each CA/eab-kid and that works well. When we failover to second server it works as well even though the config directory is not synchronized, certbot fills it with data it needs. Not sure if that has any handicap but we did not notice any issue

We'd need to migrate "ACME server" to container and we already detected some issues with certbot so yep that can be reason to use something else.

Btw I think (I might be wrong) last year there was one ACME client which had a vulnerability/backdoor, cant remember the case. But that was also one of the reasons we chose certbot - because it is created and maintained by the ACME founders, so I thought what could be actually better? :slight_smile:

PS: using certbot might be helpful when solving issues with CA. For example we have troubleshooted one issue with Digicert and certbot logs was something their engineering team understood. This can be important.

Thanks for your interest and I hope I replied to all your questions.
Zdenek

2 Likes

This will be quick because I am way behind schedule today...

Not sure about question about "registered rights",

I meant "who registered the domains or has administration rights", and it seems these are mostly customer domains and you have a SAAS/PAAS system.

Yeah, Certbot is fine for that. I am a big fan of using separate --config-dirs to have multiple "installations" using the same Certbot install. The issue you're going to run into here (and likely have already) is keeping track in your code of which config is responsible for which renewals, and on which server.

Anyways, there are a few things to talk about. IMHO, Certbot is not a good candidate for your system.

Certbot does not scale well into the hundreds of domains. While most scaling issues are concerned with how it (badly) tries to manage large Apache/Nginx integrations, the big issue of concern to you here is the "backing datastore" for renewals and certificate management is just a flat file structure – IIRC, it constantly re-parses all those config files and certs every time you invoke it (enrollment, daily renewal, etc).

I have a few concerns on how you're leveraging Certbot - what you described seems very delicate and prone to break. You should really consider using an ACME library OR forking a simpler client. You can also consider using a client that is built into the Load Balancer or web-servers and will "autocert" on demand - several now offer that, and even use local/cloud storage to consolidate certificates.

In any event, my suggestions would be this:

  1. On the LB, I would funnel all traffic from ./well-known/acme-challenge to a single server running ACME clients. Having multiple ACME clients behind one system tends to create issues. Sharding traffic based on the domain can minimize this, but you really don't need to consider a second server with an ACME client until you have many thousands of domains.

2- If you keep using Certbot, look into the psutil package to replace subprocess. [psutil.Popen](https://psutil.readthedocs.io/en/latest/#psutil.Popen) is a drop-in replacement for subprocess.Popen, but you get all the process management and system management tools from the psutil package made available to you. That makes it a lot easier to handle management.

3- If you're dealing with DNS delegation, I STRONGLY suggest running your own instance of acme-dns and having your clients CNAME their acme-challenge onto either a pre-assigned/generated subdomain on that system OR a deterministic subdomain (e.g. client-example.com -> com.client-example.acme-authz.your-domain.com). While acme-dns uses UUIDS for their subdomains, the datastore can be manipulated to use subdomains and I wrote a simple tool for that.

4- You probably don't need to worry much about security concerns, because your ACME client will be running behind a LB and in an isolated container.

IMHO, I think the best option for you would be to consider forking a smaller ACME client and building that into your system. The acme-tiny client is incredibly small Python client (200 lines of code) and is a great starting point. There are also many Python libraries and utilities that handle the core certificate and acme operations. Building something custom into your existing management system might take 2-4 days, but then you'll largely be done (except when you need to update for ARI support which Certbot doesn't have yet, etc).

Having previously gone through what you're currently going through, I can say the ACME client logic is the easiest part of your effort. The largest effort is in the backend business logic and mapping that to the ACME client. Designing your own ACME client for an easier integration / api will significantly reduce the complexity of your work.

A few years ago, I open-sourced our ACME client, PeterSSLers that is designed for situations similar to yours. I have to finish backporting a lot of features from the production version to the public one (especially renewal logic) - but it should give you some ideas on architectural concerns for a scalable system like yours. That system was designed to dynamically load SSL Certificates into nginx (via OpenResty, via a companion plugin).

4 Likes