How to continuously create/renew certificates without hitting limits?

Trololololo

You have options, if you can have volumes shared across the cluster. Caddy2 clustering - Help - Caddy Community (nginx can do the same, but you'll have to make it play nice with certbot)

5 Likes

So here is a rough proof of concept for clustered Let's Encrypt with Traefik in Docker Swarm. Without a shared file system. Without spending €3000.

DO NOT USE THIS FOR WEBSITES WITH SLAs, there are many ways it can break. If the container for example is re-scheduled to a different node, certbot needs to rebuild all certificates, this takes time and may hit limits, rendering services unavailable.

Alternatives
It's probably a lot saver to use a shared file system if you can and are willing to set it up.

Workflow

  1. Run single instance of certbot container on a Traefik node
  2. Start web-server in container for own challenge and serving dynamic config
  3. Loop: Fetch domains from Traefik API
  4. Loop: Generate own challenge to see if domain is reachable
  5. Loop: Run certbot certonly --non-interactive --keep-until-expiring ...
  6. Loop: Create dynamic config file with certs for Traefik
  7. Loop: Sleep 15 seconds

Own challenge
The own challenge is implemented because routing information may be misconfigured and we don't want to connect to Let's Encrypt every 15 seconds with the same non-working domain, it will reach limits very fast.

Potential To-Dos
Currently the container serves all existing file certificates. Some may be not needed, some may be expired. So it makes sense to combine with the domain list from Traefik, but that may be empty (network trouble) and in no case you want to create an empty dynamic config file, or it may contain non-working domains without cert files. Food for thought.

Traefik configuration
Traefik can use a http provider in static config to poll in interval the dynamic configuration:

providers:
  http:
    endpoint: "http://traefik_certbot/traefik-certbot.yml"
    pollInterval: 15s
    pollTimeout: 5s

Traefik dynamic configuration
The certbot container will return the dynamic config with certificates inline:

tls:
  options:
    default:
      minVersion: VersionTLS12
  certificates:
    # CERT FILE /etc/letsencrypt/live/example.com
    - certFile: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      keyFile: |-
        -----BEGIN PRIVATE KEY-----
        ...
      -----END PRIVATE KEY-----
    # CERT FILE /etc/letsencrypt/live/www.example.com
    - certFile: |-
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
        -----BEGIN CERTIFICATE-----
        ...
        -----END CERTIFICATE-----
      keyFile: |-
        -----BEGIN PRIVATE KEY-----
        ...
      -----END PRIVATE KEY-----   

Proof-of-concept shell code

# run within certbot container
apk add curl pcre-tools

WEBROOT=/webroot
echo WEBROOT $WEBROOT
mkdir -p $WEBROOT/.well-known/acme-challenge

echo START WEBSERVER
python -m http.server 80 --directory $WEBROOT &
/bin/sleep 2

while true; do

  echo FETCH DOMAINS
  DOMAINS=$( \
    curl --silent --max-time 5 http://user:pass@traefik:8080/api/http/routers | \
    pcregrep -o '(?<=Host\(`).*?(?=`\))' | sort | uniq \
  )
  echo FETCH DONE

  for NAME in $DOMAINS; do
    echo DOMAIN $NAME

    FILE=/.well-known/acme-challenge/traefik-certbot-$EPOCHREALTIME
    touch $WEBROOT$FILE
    curl --silent --max-time 5 http://$NAME$FILE >> /dev/null
    ERR=$?
    rm $WEBROOT$FILE

    if [ $ERR -eq 0 ]; then
      echo DOMAIN CHALLENGE OK, RUN CERTBOT
      certbot certonly \
        --webroot -w $WEBROOT \
        --non-interactive \
        --agree-tos \
        --no-eff-email \
        --keep-until-expiring \
        -m email@example.com \
        --quiet \
        --cert-name $NAME \
        -d $NAME
      if [ $? -eq 0 ]; then
        echo CERTBOT OK $NAME
      else
        echo CERTBOT FAILED $NAME
      fi
    else
      echo DOMAIN CHALLENGE FAILED http://$NAME$FILE
    fi

  done

  echo TRAEFIK TLS FILE GENERATION
  FILE=$WEBROOT/traefik-certbot.yml
  printf "tls:\n  options:\n    default:\n      minVersion: VersionTLS12\n  certificates:\n" > $FILE
  for NAME in $(find /etc/letsencrypt/live/ -maxdepth 1 -mindepth 1 -type d -print) ; do
    printf "TRAEFIK TLS FILE ADD $NAME\n"
    printf "    # CERT FILE $NAME\n" >> $FILE
    printf "    - certFile: |-\n" >> $FILE
    sed -e 's/^/        /' $NAME/fullchain.pem >> $FILE
    printf "      keyFile: |-\n" >> $FILE
    sed -e 's/^/        /' $NAME/privkey.pem >> $FILE
  done

  #echo TREAFIK TLS FILE CONTENT
  #cat $WEBROOT/traefik-certbot.yml

  echo SLEEP
  /bin/sleep 15
done
2 Likes

Personally I would say this is mandatory for any system using certificates.

5 Likes

Example certbot docker-compose.yml for use with Traefik's provider.http with inline TLS certificates. It will create LetsEncrypt certificates for all Hosts, Traefik will just see them as regular TLS certificate files in dynamic configuration. THIS IS NOT PRODUCTION READY.

version: '3.9'

  certbot:
    image: certbot/certbot
    entrypoint: ["/bin/sh", "-c"]
    command: 
      - |
        apk add curl pcre-tools jq

        WEBROOT=/webroot
        echo WEBROOT $$WEBROOT
        mkdir -p $$WEBROOT/.well-known/acme-challenge

        echo START WEBSERVER
        python -m http.server 80 --directory $$WEBROOT &
        /bin/sleep 2

        while true; do

          echo FETCH DOMAINS
          DOMAINS=$$( \
            curl --silent --max-time 5 http://user:pass@traefik_traefik:8080/api/http/routers | \
            pcregrep -o '(?<=Host\(`).*?(?=`\))' | sort | uniq \
          )
          echo FETCH DONE

          for NAME in $$DOMAINS; do
            echo DOMAIN $$NAME

            FILE=/.well-known/acme-challenge/traefik-certbot-$$EPOCHREALTIME
            touch $$WEBROOT$$FILE
            curl --silent --max-time 5 http://$$NAME$$FILE >> /dev/null
            ERR=$$?
            rm $$WEBROOT$$FILE

            if [ $$ERR -eq 0 ]; then
              echo DOMAIN CHALLENGE OK, RUN CERTBOT
              certbot certonly \
                --webroot -w $$WEBROOT \
                --non-interactive \
                --agree-tos \
                --no-eff-email \
                --keep-until-expiring \
                -m email@example.com \
                --quiet \
                --cert-name $$NAME \
                -d $$NAME
              if [ $$? -eq 0 ]; then
                echo CERTBOT OK $$NAME
              else
                echo CERTBOT FAILED $$NAME
              fi
            else
              echo DOMAIN CHALLENGE FAILED http://$$NAME$$FILE
            fi

          done

          echo TRAEFIK TLS FILE GENERATION
          FILE=$$WEBROOT/traefik-certbot.yml
          printf "tls:\n  options:\n    default:\n      minVersion: VersionTLS12\n  certificates:\n" > $$FILE
          for NAME in $$(find /etc/letsencrypt/live/ -maxdepth 1 -mindepth 1 -type d -print) ; do
            echo TRAEFIK TLS FILE ADD $$NAME
            printf "    # CERT FILE $$NAME\n" >> $$FILE
            printf "    - certFile: |-\n" >> $$FILE
            sed -e 's/^/        /' $$NAME/fullchain.pem >> $$FILE
            printf "      keyFile: |-\n" >> $$FILE
            sed -e 's/^/        /' $$NAME/privkey.pem >> $$FILE
          done

          #echo TREAFIK TLS FILE CONTENT
          #cat $$WEBROOT/traefik-certbot.yml

          echo SLEEP
          /bin/sleep 15
        done

    hostname: '{{.Node.Hostname}}'
    networks:
      - proxy
    volumes:
      - traefik-certificates:/etc/letsencrypt
    deploy:
      replicas: 1 # only single instance
      placement:
        constraints:
          - node.role==manager # for service discovery
      labels:
        - 'traefik.enable=true'
        - 'traefik.http.routers.certbot.entrypoints=web'
        - 'traefik.http.routers.certbot.rule=PathPrefix(`/.well-known/acme-challenge`)'
        - 'traefik.http.routers.certbot.priority=1024'
        - 'traefik.http.services.certbot.loadbalancer.server.port=80'

Pieces of the Traefik static configuration:

providers:
  docker:
    endpoint: "unix:///var/run/docker.sock"
    swarmMode: true
    exposedByDefault: false
    network: proxy
  
  # for dashboard
  file: 
    filename: /traefik-dynamic.yml
    watch: true

  # for LetsEncrypt certificates from certbot
  http: 
    endpoint: "http://traefik_certbot/traefik-certbot.yml"
    pollInterval: 15s
    pollTimeout: 5s
    
entryPoints:
  web:
    address: :80
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
          priority: 1000 # needed for certbot to be able to overrule
  websecure:
    ...

It's probably better to use a shared folder across all Traefik nodes and write the dynamic config file into the folder and use Traefik with watching provider.file instead. Then you could just copy and reference the certificates instead of inlining them. Be aware that the host folder needs to exist when running with Docker Swarm.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.