Unable to renew ACME Cert via Traefik edge router -> Status Pending

Hie There, since yesterday traefik seems to be unable to renew acme certs for internal usage. I ran this config since several months without any issues. I haven't changed anything. There is only a second device in my LAN which also requests ACME-Certs via same dns challange (new pfSense with HA-proxy).

Any suggestions what I'm doing wrong?

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | man-owns.eu), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is: man.owns.eu

I ran this command: dns-01 challange done by traefik ACME Implementation

Here all ACME relevant traefik commands:

      - --certificatesresolvers.myresolver.acme.dnschallenge=true
      - --certificatesresolvers.myresolver.acme.dnschallenge.resolvers=1.1.1.1:53
      - --certificatesresolvers.myresolver.acme.dnschallenge.provider=cloudflare
      - --certificatesresolvers.myresolver.acme.caserver=https://acme-v02.api.letsencrypt.org/directory
      - --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
      - --entrypoints.web.address=:80 # <== Defining an entrypoint for port :80 named web
      - --entrypoints.websecure.address=:443 # <== Defining an entrypoint for https on port :443 (not really needed)
      - --entrypoints.websecure.http.tls=true
      - --entrypoints.websecure.http.tls.certresolver=myresolver
      - --entrypoints.websecure.http.tls.domains[0].main=man-owns.eu
      - --entrypoints.websecure.http.tls.domains[0].sans=*.man-owns.eu

Cloudflare tokens are set above (not visible here) as variable.

It produced this output:
https://acme-v02.api.letsencrypt.org/acme/chall-v3/113065919666/e0WiVA

Traefik Log-Output:

time="2022-05-27T08:44:29Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/113065919666"
time="2022-05-27T08:42:27Z" level=debug msg="legolog: [INFO] [man-owns.eu, *.man-owns.eu] acme: Obtaining bundled SAN certificate"
time="2022-05-27T08:44:29Z" level=debug msg="legolog: [INFO] Deactivating auth: https://acme-v02.api.letsencrypt.org/acme/authz-v3/113065919676"
time="2022-05-27T08:42:28Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/113065919666"
time="2022-05-27T08:42:28Z" level=debug msg="legolog: [INFO] [man-owns.eu] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/113065919676"
time="2022-05-27T08:42:28Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: use dns-01 solver"
time="2022-05-27T08:42:28Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Could not find solver for: tls-alpn-01"
time="2022-05-27T08:42:28Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Could not find solver for: http-01"
time="2022-05-27T08:42:28Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: use dns-01 solver"
time="2022-05-27T08:42:28Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: Preparing to solve DNS-01"
time="2022-05-27T08:42:58Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Preparing to solve DNS-01"
time="2022-05-27T08:43:28Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: Cleaning DNS-01 challenge"
time="2022-05-27T08:43:58Z" level=debug msg="legolog: [WARN] [*.man-owns.eu] acme: cleaning up failed: cloudflare: could not find the start of authority for _acme-challenge.man-owns.eu.: read udp 172.29.0.3:50099->1.1.1.1:53: i/o timeout "
time="2022-05-27T08:43:58Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Cleaning DNS-01 challenge"
time="2022-05-27T08:44:28Z" level=debug msg="legolog: [WARN] [man-owns.eu] acme: cleaning up failed: cloudflare: could not find the start of authority for _acme-challenge.man-owns.eu.: read udp 172.29.0.3:60525->1.1.1.1:53: i/o timeout "
time="2022-05-27T08:44:28Z" level=debug msg="legolog: [INFO] retry due to: acme: error: 400 :: POST :: https://acme-v02.api.letsencrypt.org/acme/authz-v3/113065919666 :: urn:ietf:params:acme:error:badNonce :: JWS has an invalid anti-replay nonce: \"0002iZaA0ZB-HTt9E2xw0i8ziTS6Tdn3ITlcvxv4vROAX3U\""
My web server is (include version): traefik (tested versions from 2.3 up to latest 2.7)

The operating system my web server runs on is (include version): QTS 5.0.0 Docker environment. traefic is running from official docker image

My hosting provider, if applicable, is: Selfhosted on my QNAP TVS-672X at home

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
My Docker UI is Portainer.

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): I don't know what traefik is using in background

Looks like something is blocking outgoing (or the corresponding incoming) packets for UDP port 53 (DNS) to at least 1.1.1.1.

5 Likes

Hi @Osiris

thanks for reply.
Inside traefik docker container nslookup seems to be working properly.

/ # nslookup google.com 1.1.1.1
Server:         1.1.1.1
Address:        1.1.1.1:53

Non-authoritative answer:
Name:   google.com
Address: 142.250.185.238

Non-authoritative answer:
Name:   google.com
Address: 2a00:1450:4001:82b::200e
1 Like

@Osiris
Ha! you are right. Really strange behavior. After sniffing around I found out, there is are requests to 1.1.1.1 Resolvers. But no answer anywhere...


capture from pfsense LAN-Interface

The real strange thing is, nslookup to an A-Record works fine. But requesting a SOA-Record runs into timeout. I also can't see any packets on the line. It's reproducible with manual nslookup. No idea how or why my pfsense Firewall blocks that (or the root cause something completely different). There are also no corresponding entries in FW-Log.

TL:DR

I removed dns-resolver override to 1.1.1.1#53 and let the container use its default dns resolver 127.0.0.11#53 (local DNSMasq from QNAPs Docker Environment).

Now it works again :smiley:

time="2022-05-27T17:51:35Z" level=debug msg="Filtering disabled container" providerName=docker container=telegraf-influxdb-cf2b441d3e91096b57ece9e800f77714a2fae1de56c264726aade650beac32bb
time="2022-05-27T17:51:37Z" level=debug msg="legolog: [INFO] [man-owns.eu, *.man-owns.eu] acme: Obtaining bundled SAN certificate"
time="2022-05-27T17:51:38Z" level=debug msg="legolog: [INFO] [man-owns.eu] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/2553497964"
time="2022-05-27T17:51:38Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/2553497954"
time="2022-05-27T17:51:38Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Could not find solver for: http-01"
time="2022-05-27T17:51:38Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Could not find solver for: tls-alpn-01"
time="2022-05-27T17:51:38Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: use dns-01 solver"
time="2022-05-27T17:51:38Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: use dns-01 solver"
time="2022-05-27T17:51:38Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: Preparing to solve DNS-01"
time="2022-05-27T17:51:40Z" level=debug msg="legolog: [INFO] cloudflare: new record for man-owns.eu, ID 74dc5aff5d6d459efa3ace306002f23c"
time="2022-05-27T17:51:40Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Preparing to solve DNS-01"
time="2022-05-27T17:51:41Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: Trying to solve DNS-01"
time="2022-05-27T17:51:41Z" level=debug msg="legolog: [INFO] cloudflare: new record for man-owns.eu, ID a6a3d810b60eb73e398f4b4dace78919"
time="2022-05-27T17:51:41Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: Checking DNS record propagation using [127.0.0.11:53]"
time="2022-05-27T17:51:43Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]"
time="2022-05-27T17:51:54Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] The server validated our request"
time="2022-05-27T17:51:54Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Trying to solve DNS-01"
time="2022-05-27T17:51:54Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Checking DNS record propagation using [127.0.0.11:53]"
time="2022-05-27T17:51:56Z" level=debug msg="legolog: [INFO] Wait for propagation [timeout: 2m0s, interval: 2s]"
time="2022-05-27T17:52:16Z" level=debug msg="legolog: [INFO] [man-owns.eu] The server validated our request"
time="2022-05-27T17:52:16Z" level=debug msg="legolog: [INFO] [*.man-owns.eu] acme: Cleaning DNS-01 challenge"
time="2022-05-27T17:52:16Z" level=debug msg="legolog: [INFO] [man-owns.eu] acme: Cleaning DNS-01 challenge"
time="2022-05-27T17:52:17Z" level=debug msg="legolog: [INFO] [man-owns.eu, *.man-owns.eu] acme: Validations succeeded; requesting certificates"
time="2022-05-27T17:52:19Z" level=debug msg="legolog: [INFO] [man-owns.eu] Server responded with a certificate."

Thanks for the hint

1 Like

Perhaps the SOA record is too long for UDP 53 and you are not allowing TCP 53.

2 Likes

Maybe. But my Docker-Host is in a "privileged" VLAN where should be no internet restrictions. :thinking:

A new thesis:
Is it possible that public resolvers don't allow SOA Requests over unencrypted UDP 53?! The only difference I can see is, that my pfsense talks via TCP 853 to 1.1.1.1 upstream server and this is working. Also when I sniff on pfSense WAN-Port in promiscuous mode I can't see any response to my SOA-Request sent from unsecured nslookup client.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.