Certferry - easy distribution of wildcard LE certificates

Following Same Wildcard certificate on multiple servers and similar topics.

Apparently, the LE/certbot infrastructure is missing a solution for distributing/renewing wildcard certificates across multiple servers without touching the Let's Encrypt API (and triggering undesirable limits). As paid 1-year certificates are going to be commercially unavailable soon due to CA/Browser Forum decision, automated certificate management across server fleets becomes even more critical. I'd like to share a simple tool I created while setting up a 100-server infrastructure.

Here's a brief description of how it looks like:

  • First of all, you setup the usual certbot workflow on your upstream server and deploy your private key on your downstream servers once
  • Certferry handles fetching and updating the certificate(s) for you on the downstream servers.

Important thing is that certferry requires zero configuration on upstream server as certificate exchange is already a part of TLS protocol. In any other aspects it's pretty much similar to certbot - so configuring up certferry on a downstream server is as easy as running "certferry yourdomain.com". It then fetches a certificate by making a TLS handshare, checks SNI / private key match and runs deploy hooks for reloading server configs if needed. If everything went fine, it also installs a systemd timer to check for due certificates twice a day just as certbot does.

If you have any questions - please feel free to ask me here.

Certferry (GitHub)

Cool, you could also fetch from secrets vaults like Hashiscorp Vault/Azure Keyvault etc, that way it's not tied to certbot specifically.

Certbot's default behavior is to create a new private key for each renewal. You will need to change that to preserve private keys to allow your "fetch" via TLS to work.

That's fine but there is no reason you have to match Certbot's frequency. It uses random times (by default). You could check at a time and frequency that makes sense for the specific server you are controlling. Certbot uses ARI so the specific date and time it renews is controlled by that (along w/its cron or timer frequency).

You might also setup some kind of alerting system if the cert on the server you are managing is getting too close to expiration. And, factor in the lifetime of the cert to support short-lived certs along with the upcoming gradual shortening of "normal" length certs.

Just a point of clarity - Let's Encrypt is the ACME Server and is not involved in configuring your services to use the certs it issues. That is the job of the ACME Client and related tooling.

Your tool would work regardless of which CA Certbot retrieved the certificate from. That is, there isn't a "LE/Certbot" infrastructure specifically.

Certbot has a --deploy-hook which allows distributing certs (and keys). Although in complex setups having servers "pull" their certs from a common store may be better.

UPDATE: I see from your docs you do mention needing to set --reuse-key. But, I also see you say this:

On each run, it checks all certificates in /etc/letsencrypt/live/ and fetches fresh ones for any expiring within 29 days. If the fetched certificate is identical to the existing one, no files are written and no hooks are triggered.

You should review this about short-lived certs: Profiles - Let's Encrypt

That's fine but there is no reason you have to match Certbot's frequency. It uses random times (by default). You could check at a time and frequency that makes sense for the specific server you are controlling. Certbot uses ARI so the specific date and time it renews is controlled by that (along w/its cron or timer frequency).

So do I: we have RandomizedDelaySec=3600 in /etc/systemd/system/certferry-renew.timer, and I check local certificate validity period to avoid sending any requests to upstream server unless there's a due certificate that requires renewal.

Certbot has a --deploy-hook which allows distributing certs (and keys). Although in complex setups having servers "pull" their certs from a common store may be better.

Yes, the point is that certferry can simplify the things a lot in some environments like in my case (comparing to ansible or rsync based flow, or whatever else).

You should review this about short-lived certs: Profiles - Let's Encrypt

That's a good point - actually, I was also thinking about that. Maybe I should simply change to 2 days.

You can't know (fully) when a cert is due for renewal without checking ARI. The CA can set those windows as they see fit. Most important, when a CA has a mass revocation event this is informed using ARI. You could be left with an expired cert for some time if only check cert expiration date. See: ACME Renewal Information (ARI) Published as RFC 9773 - Let's Encrypt

What is the drawback to checking your "upstream" server each time? Would that server be overly burdened by the couple times / day your "downstream" servers make a TLS connect? Your downstreams are checking at random times so no thundering herd problem.

Note that ARI is not account specific so anyone can make those requests knowing a specific cert. But, seems simpler to just ensure your upstream server can handle this (small) load rather than adding complexity to Certferry. Complexity leads to unreliability in my experience.

I'd stay away from top of the hour with something like:
[presuming it's being called at the top of the hour]
RandomizedDelaySec=3400
RandomizedDelaySec=RandomizedDelaySec+100

Perhaps consider RandomizedDelaySec=10800 to spread the load across 3H or even longer. Doing one of those twice a day spreads the requests across 6H each day.

It is hard to know what time their own "upstream" server is busiest. You can't line up these downstream requests to Certbot's renewals on that upstream anyway so no point in trying to sync them.