K3os w/Traefik: Timeout during connect (likely firewall problem), Lets Debug is OK

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is: snow2.alt.kye.dev, snow.alt.kye.dev

I'm using the Traefik ACME provider with http/tls mode. I have a wildcard setup for the subdomain *.alt.kye.dev (using this for testing). I'm getting this error

"Unable to obtain ACME certificate for domains \"snow2.alt.kye.dev\": unable to generate a certificate for the domains [snow2.alt.kye.dev]: error: one or more domains had a problem:\n[snow2.alt.kye.dev] acme: error: 400 :: urn:ietf:params:acme:error:connection :: 50.39.98.126: Timeout during connect (likely firewall problem)\n" providerName=le.acme routerName=snow-tracker-snow-snow2-alt-kye-dev@kubernetes rule="Host(`snow2.alt.kye.dev`) && PathPrefix(`/`)"

However, I hit the cert limit for one of these domains. Seems certs are getting minted but not returned. I am able to hit these domains though, and get served the default traefik cert.

Lets Debug shows no issues. I am able to hit traefik from the local network, and from a public network. I can SSH into the server and successfully curl out.

It seems like all the network stuff is properly configured... maybe this is an issue with the traefik config? I'm using the latest version of K30S with its bundled traefilk, and adding the following HelmChartConfig

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    logs:
      general:
        level: INFO
    ports:
      websecure:
        tls:
          enabled: true
    ingressClass:
      enabled: true
      isDefaultClass: true
    ingressRoute:
      dashboard:
        enabled: false
    globalArguments:
      - "--global.checknewversion"
      - "--global.sendanonymoususage=false"
    additionalArguments:
      - "--certificatesresolvers.le.acme.email=tim@kye.dev"
      - "--certificatesresolvers.le.acme.storage=/data/acme.json"
      - "--certificatesresolvers.le.acme.tlschallenge=true"
      - "--certificatesresolvers.le.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory"

# Swap between these for testing
# - "--certificatesresolvers.le.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory"
# - "--certificatesresolvers.le.acme.caServer=https://acme-v02.api.letsencrypt.org/directory"

Complete Stack
K3os: v0.21.5-k3s2r1
Traefik: Bundled with K3os
Running in VM on TrueNAS Scale22.12.1

I don't know your environ well so can't help much.

But, Let's Debug reports OK on the summary but the detail results show problems. I'm not sure why it says OK for a 5xx class result but it does.

Note the 502 Bad Gateway for Let's Debug initial test and similar 502 for Let's Encrypt Staging test

The result you showed was a timeout so something different. Clearly the situation has changed. I just point this out in case it gives you a clue to resolving problem.

4 Likes

There seems to be an issue with anything to the /.well-known/acme-challenge/ path.

"Other" paths get the 302 redirection:

curl -Ii snow2.alt.kye.dev/.well-known/NOT-acme-challenge/Test_File-1234
HTTP/1.1 308 Permanent Redirect
Location: https://snow2.alt.kye.dev/.well-known/NOT-acme-challenge/Test_File-1234
Date: Sun, 02 Apr 2023 17:39:46 GMT
Content-Length: 18
Content-Type: text/plain; charset=utf-8

The ACME challenge path takes a very long time to return 404:

curl -Ii snow2.alt.kye.dev/.well-known/acme-challenge/Test_File-1234
HTTP/1.1 404 Not Found
Date: Sun, 02 Apr 2023 17:40:41 GMT

But LE might not wait as long as I did (~36 seconds) and it times out.

3 Likes

I'm actively tinkering with this, so results may be inconsistent. My current config is working via curl with the -k flag (ignore cert errors), but still not working normally.

Traefik

additionalArguments:
  # Prod / Staging Example:
  -  --certificatesresolvers.staging.acme.tlschallenge=true
  - --certificatesresolvers.staging.acme.email=tim@kye.dev
  - --certificatesresolvers.staging.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory
  # - --certificatesresolvers.staging.acme.httpchallenge.entryPoint=web
  - --certificatesresolvers.staging.acme.storage=/certs/acme-staging.json
  - --certificatesresolvers.prod.acme.tlschallenge=true
  - --certificatesresolvers.prod.acme.email=tim@kye.dev
  - --certificatesresolvers.prod.acme.caServer=https://acme-v02.api.letsencrypt.org/directory
  # - --certificatesresolvers.prod.acme.httpchallenge.entryPoint=web
  - --certificatesresolvers.prod.acme.storage=/certs/acme-production.json

ports:
  web:
    redirectTo: websecure
  websecure:
    tls:
      enabled: true
      # certResolver: staging
      certResolver: prod

Ingress

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: snow-tracker
  namespace: snow-tracker
spec:
  entryPoints:
    - web
    - websecure
  routes:
  - match: Host(`snow2.alt.kye.dev`)
    kind: Rule
    services:
    - name: snow-tracker
      port: 80
  tls:
    certResolver: staging
    domains:
    - main: snow2.alt.kye.dev

Ive tried using the prod resolver, but traefik is no longer producing any logs.

I switched to cert-manager and things are working

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.