How to continuously create/renew certificates without hitting limits?

We are using Traefik and Docker Swarm to run our SaaS applications. Traefik creates routing to the services/containers on the-fly through service discovery, polling Swarm every 15 seconds. For those routes we want to create Let's Encrypt certificates. (We can't use Traefik own integrated process because it's not easily cluster-able.)

We can get the list of hosts from Traefik, polling every 15 seconds for updates:

# curl -s http://traefik:8080/api/http/routers | pcregrep -o '(?<=Host\(`).*?(?=`\))' | sort | uniq
admin.example.com
dashboard.example.com
client_01.example.com
client_02.example.com

When I get a list of hosts every 15 seconds, I don't want to run certbot certonly -standalone every time, because it will register for every run (limits!) and create new certificates (limits!), even if I have valid certificates present. certbot renew -standalone just threw errors on me.

How should we structure a process that is triggered every 15 seconds to create/renew Let's Encrypt certificates? certbot gets a list of hosts, should check if certs do not exist or need renewal, only then register and create/update the certificates, otherwise just exit. Is there a single command to do this? Should we do this with every host individually or in batch?

1 Like

It sounds like you might hit some limit if you are continually adding or removing names from the list.
Your best case scenario (IMO) would be to obtain individual certs.
That way the only time certbot would be required to do anything is when a new name is added.
OR
Only concern yourself with adding names to the list [not removing any - since such names may return within their already covered period].

In any case, it will eventually require a name cleanup process [to remove any expired names/certs].

If they will all be serving the same set of names, then they could use the exact same cert(s).
Certs can be copied from system to system.

4 Likes

@rg305 Thanks for you extensive response.

What I don't understand is if certbot has a single create-or-renew command option which I can execute every minute without hitting limits.

certbot create-or-renew --standalone --non-interactive -d www.example.com

Meaning certbot will only register and create a certificate when there is no certificate file OR it's already 60 days old. Otherwise certbot will just quit without doing anything.

Or do I need to create this logic outside of certbot myself?

2 Likes

Sounds like you need to create your own logic for this, like:
certbot renew --standalone --non-interactive -d www.example.com [- d example.com]
If not found, then "create" it:
certbot --standalone --non-interactive -d www.example.com [- d example.com]
[the "if not found" should be detectable via the exit code]

Otherwise, you could also just do both and ignore all errors.

3 Likes

Thanks. My challenge is that certbot certonly --standalone will register and create a new cert every time it's called. If I call it every 15 seconds it is unnecessary load and it will hit the limits.

Will check certbot renew and exit codes...

2 Likes

Certbot is designed to be used by persistent servers, not multi-node scaling systems or adaptive deployments. You should be using another client - off the shelf or homegrown - or build a specialized system to control Certbot.

There are several servers and gateways that offer cloud-storage (so you can share certs) and autocert functionality (certificate on demand). There are also some certificate managers that may be of use.

We developed an in-house acme-client+certificate manager and nginx module for:

  • API driven certificates
  • autoloading into Nginx
  • SQL based Certificate Management

I opensourced it a while back, but the public repo is out of date GitHub - aptise/peter_sslers: or how i stopped worrying and learned to love the ssl certificate

there may still be useful materials in there for you.

7 Likes

Why don't you put a wildcard cert per domain on Traefik?

5 Likes

You probably also want to be sure to read through the Integration Guide

I agree that you might be better served by some client other than Certbot.

6 Likes

I have no API for our DNS service and I don't want to manually edit DNS every couple of weeks.

1 Like

Oh, now it gets complicated

#certbot renew -v --standalone --non-interactive --test-cert -m mail@example.com -d client_01.example.com
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Currently, the renew verb is capable of either renewing all installed certificates that are due to be renewed or renewing a single certificate specified by its name. If you would like to renew specific certificates by their domains, use the certonly command instead. The renew verb may provide other options for selecting certificates to renew in the future.

So I manually need to go through all cert files to see if a domain already has a certificate.

That's not really a problem, you can use CNAME resource records to delegate the challenge to a specific zone that can answer specifically for ACME challenges using software that does have an API, such as acme-dns.

Note that the Traefik documentation actually mentions acme-dns: Traefik Let's Encrypt Documentation - Traefik

5 Likes

My plan is to create a Traefik-Certbot Docker container that

  1. Fetches all domains from Traefik every 15 seconds
  2. Creates new certificates if required
  3. Updates old certificates if required
  4. Provides a traefik-dynamic.yml via HTTP with all certs inline
    (Traefik will fetch this every 15 seconds for LE certificate updates)

Because Traefik has integrated Let's Encrypt it seems no one has done this before. I run Traefik in Docker Swarm and the clustering is breaking the integrated mechanism. I assume normal people just migrate to k8s :wink:

1 Like

Are you sure? :slight_smile: Have you asked on a Traefik forum? Maybe there is a better architecture.

The outline you describe looks to require complex error handling.

5 Likes

I have no experience with Traefik, but it sounds to me like the automatically-get-certs-on-demand approach is what systems like Caddy do. You might just want your HTTPS to be in front of your systems using something like that which can handle all the certificates itself.

6 Likes

Try changing:
-d client_01.example.com
To:
--cert-name client_01.example.com

5 Likes

Official clustered Let's Encrypt is only supported by TraefikEE. And after a call they tell you pricing starts at €3000/year.

1 Like

Excellent, that works! First step to a working solution :slight_smile:

# run in certbot Docker container
apk add curl
apk add pcre-tools

/bin/sleep 20 # for service discovery to establish routing

DOMAINS=$(curl --silent --max-time 5 http://user:pass@traefik:8080/api/http/routers | \
  pcregrep -o '(?<=Host\(`).*?(?=`\))' | sort | uniq)

for DOM in $DOMAINS; do 
  echo RENEW $DOM
  certbot renew --standalone --non-interactive --agree-tos --no-eff-email --test-cert --cert-name $DOM
  if [ $? -ne 0 ]; then
    echo CREATE $DOM
    certbot certonly --standalone --non-interactive --agree-tos --no-eff-email --test-cert -m mail@example.com -d $DOM
  fi
done
2 Likes

Did that just save you from?:

LOL

3 Likes

This should work with the --cert-name suggestion. The fullest form would be something like

certbot certonly --standalone --non-interactive --keep-until-expiring --cert-name www.example.com -d www.example.com

The --keep-until-expiring tells it not to renew if the certificate exists and is not very old. (This might already be the default with --non-interactive, so it might be redundant to specify it.)

This command will obtain a new certificate if no such certificate exists, exit immediately if it exists and is not due for renewal, and renew it if it exists and is due for renewal (by default, if it is expiring in <30 days from now).

7 Likes

This is awesome, it works, thank you!

2 Likes