Kubernetes cert-manager Challenge Failing with ACME Unauthorized Error

Kubernetes cert-manager Challenge Failing with ACME Unauthorized Error

I am managing a Kubernetes cluster with ArgoCD, which includes an ingress-nginx. We are trying to obtain a certificate for HTTPS on a domain using cert-manager, but we are encountering a series of errors during the process.

The challenge fails with the following error:


Failed 10s cert-manager-challenges Accepting challenge authorization failed: acme: authorization error for example.org: 403 urn:ietf:params:acme:error:unauthorized: 2001:8d8:100f:f000::200: Invalid response from http://example.org/.well-known/acme-challenge/ZzU4jDSzvVCHwPHwPMsUleJDwf-K3URomZwuhQgNZOo: 204. 

The challenge generates a cm-acme-http-solver with the following logs:


I0712 10:37:33.303185       1 solver.go:39] "cert-manager/acmesolver: starting listener" expected_domain="example.org" expected_token="NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" expected_key="NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY.2DhKIbjC5D1b_jnc9Katl9vWzWPu9HPi-bGtAm8wLnw" listen_port=8089 

I0712 10:37:42.095947       1 solver.go:64] "cert-manager/acmesolver: validating request" host="example.org" path="/.well-known/acme-challenge/NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" base_path="/.well-known/acme-challenge" token="NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" 

I0712 10:37:42.095987       1 solver.go:72] "cert-manager/acmesolver: comparing host" host="example.org" path="/.well-known/acme-challenge/NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" base_path="/.well-known/acme-challenge" token="NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" expected_host="example.org" 

I0712 10:37:42.096005       1 solver.go:79] "cert-manager/acmesolver: comparing token" host="example.org" path="/.well-known/acme-challenge/NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" base_path="/.well-known/acme-challenge" token="NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" expected_token="NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" 

I0712 10:37:42.096030       1 solver.go:87] "cert-manager/acmesolver: got successful challenge request, writing key" host="example.org" path="/.well-known/acme-challenge/NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" base_path="/.well-known/acme-challenge" token="NF6dyNYknCNUVSRzun5JuUzN8cNF86aflve-WInfvKY" 

... 

Error: http: Server closed 

Usage: 

  acmesolver [flags] 

  

Flags: 

      --domain string     the domain name to verify 

  -h, --help              help for acmesolver 

      --key string        the challenge key to respond with 

      --listen-port int   the port number to listen on for connections (default 8089) 

      --token string      the challenge token to verify against 

  

E0712 10:37:53.692260       1 main.go:39] "cert-manager: error executing command" err="http: Server closed" 

Additionally, the cert-manager-webhook logs show these errors:


W0712 11:33:35.571985       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. 

I0712 11:33:35.657545       1 webhook.go:128] "cert-manager: using dynamic certificate generating using CA stored in Secret resource" secret_namespace="cert-manager" secret_name="cert-manager-webhook-ca" 

I0712 11:33:35.657833       1 server.go:133] "cert-manager/webhook: listening for insecure healthz connections" address=":6080" 

I0712 11:33:35.657899       1 server.go:197] "cert-manager/webhook: listening for secure connections" address=":10250" 

I0712 11:33:36.662043       1 dynamic_source.go:255] "cert-manager/webhook: Updated cert-manager webhook TLS certificate" DNSNames=["cert-manager-webhook","cert-manager-webhook.cert-manager","cert-manager-webhook.cert-manager.svc"] 

I0712 11:33:54.925304       1 logs.go:59] http: TLS handshake error from 10.221.113.195:52358: remote error: tls: bad certificate 

I0712 11:33:59.817534       1 logs.go:59] http: TLS handshake error from 10.216.164.3:46654: EOF 

... 

Our ClusterIssuer configuration is as follows:


apiVersion: cert-manager.io/v1 

kind: ClusterIssuer 

metadata: 

  name: letsencrypt-example 

spec: 

  acme: 

    email: email@example.com 

    preferredChain: "" 

    privateKeySecretRef: 

      name: lets-encrypt-ionos-issuer-account-key 

    server: https://acme-v02.api.letsencrypt.org/directory 

    solvers: 

    - selector: 

        dnsZones: 

          - example.org 

      http01: 

        ingress: 

          ingressClassName: nginx 

And the Ingress configuration:


apiVersion: networking.k8s.io/v1 

kind: Ingress 

metadata: 

  annotations: 

    cert-manager.io/cluster-issuer: letsencrypt-example 

  name: example 

  namespace: example 

spec: 

  ingressClassName: nginx 

  rules: 

    - host: example.org 

      http: 

        paths: 

          - backend: 

              service: 

                name: example 

                port: 

                  number: 8080 

            path: / 

            pathType: Prefix 

  tls: 

    - hosts: 

        - example.org 

      secretName: example-dev-tls 

Has anyone experienced similar issues or have any insights on how to resolve these errors? Any help would be greatly appreciated.

Hello @yaniaici, welcome to the Let's Encrypt community. :slightly_smiling_face:

The HTTP-01 challenge states "The HTTP-01 challenge can only be done on port 80."

Best Practice - Keep Port 80 Open

I suspect it may be a firewall issue.

Summary

This portion should all be in strike out, but not all the elements support that thus I have tried to leave the history while not obscuring the other important information.

Both IPv4 Address shown are filtered on Ports 80 & 443.

$ nmap -4 -Pn -p80,443 10.221.113.195
Starting Nmap 7.80 ( https://nmap.org ) at 2024-07-12 22:34 UTC
Nmap scan report for 10.221.113.195
Host is up.

PORT    STATE    SERVICE
80/tcp  filtered http
443/tcp filtered https

Nmap done: 1 IP address (1 host up) scanned in 3.10 seconds
$ nmap -4 -Pn -p80,443 10.216.164.3
Starting Nmap 7.80 ( https://nmap.org ) at 2024-07-12 22:35 UTC
Nmap scan report for 10.216.164.3
Host is up.

PORT    STATE    SERVICE
80/tcp  filtered http
443/tcp filtered https

Nmap done: 1 IP address (1 host up) scanned in 3.10 seconds

But the HTTP Status is

Consider using the online tool Let's Debug.

Yet the IPv6 address both Port 80 & 443 are open

>nmap -6 -Pn -p80,443 2001:8d8:100f:f000::200
Starting Nmap 7.94 ( https://nmap.org ) at 2024-07-12 22:39 UTC
Nmap scan report for 2001-08d8-100f-f000-0000-0000-0000-0200.elastic-ssl.ui-r.com (2001:8d8:100f:f000::200)
Host is up (0.15s latency).

PORT    STATE SERVICE
80/tcp  open  http
443/tcp open  https

Nmap done: 1 IP address (1 host up) scanned in 0.75 seconds

Attempt IPv6 HTTP

>curl -6 -i http://2001-08d8-100f-f000-0000-0000-0000-0200.elastic-ssl.ui-r.com/.well-known/acme-challenge/sometestfile HTTP/1.1 204
Connection: keep-alive
Keep-Alive: timeout=15
Server: nginx
Date: Fri, 12 Jul 2024 22:51:49 GMT
X-Content-Type-Options: nosniff
X-XSS-Protection: 0
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: 0
X-Frame-Options: DENY

Attempt IPv6 HTTP on port 443

>curl -6 -i http://2001-08d8-100f-f000-0000-0000-0000-0200.elastic-ssl.ui-r.com:443/.well-known/acme-challenge/sometestfile
HTTP/1.1 400 Bad Request
Server: nginx
Date: Fri, 12 Jul 2024 22:51:56 GMT
Content-Type: text/html
Content-Length: 248
Connection: close

<html>
<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<center>The plain HTTP request was sent to HTTPS port</center>
<hr><center>nginx</center>
</body>
</html>

Attempt IPv6 HTTPS on port 443

>curl -6 -i https://2001-08d8-100f-f000-0000-0000-0000-0200.elastic-ssl.ui-r.com:443/.well-known/acme-challenge/sometestfile
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 2001-08d8-100f-f000-0000-0000-0000-0200.elastic-ssl.ui-r.com:443
1 Like

The overall problem is that LetsEncrypt's ACME server is getting a 204 response. What causes that can be many things.

The 204 error is really odd. It's only come up a few times here before, and was due to DNS misconfigs or global routing issues. I suggest searching "204" here and seeing if any of those situations or fixes apply (IIRC, one issue was the ipv4 and ipv6 pointed at different servers).

If that isn't the cause, the next likely culprits are a firewall and a misconfiguration of the proxy/routing within your system. A simple way to troubleshoot that is to host a /.well-known/acme-challege/test.txt file in the container that is supposed to serve the challenge and ensuring you can access that file. The ACME server will request the file from your server on port 80, so you need to ensure your machine is routing the acme-challenge to the right container/port.

2 Likes

It should be noted that 10.x.x.x is class A private IPv4 address space that most organizations use for their internal network routing.

I suspect that there's an IPv6-IPv4 disconnect here in terms of routing.

I have direct experience in a corporate setting of configuring cert-manager in an Azure Kubernetes Service (AKS) environment utilizing ingress-nginx with CircleCI driving the pipeline.

3 Likes

Thanks @griffin; I am not fully clear thinking still. :slightly_smiling_face:
(mis typing too)

2 Likes

No worries. We've all done it. :slightly_smiling_face:

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.