K8s Waiting for HTTP-01 challenge propagation: wrong status code '400', expected '200'

I have got brand new on prem k8s 1.30.2. It has HA enabled via kube-vip. 3 master & 3 worker nodes.

I have installed cert-manager 1.14.7 via plain manifest file.

I have got clusterissuer set to letse-staging. Deployment, service & ingress are created. I could access said application via domain on http.

Once certificate request is generated by cert-manager, I could see orders and challanges are created. But they are remaining in pending state forever.

Upon using cmctl status certificate

Conditions:
    Approved: True, Reason: cert-manager.io, Message: Certificate request has been approved by cert-manager.io
  Ready: False, Reason: Pending, Message: Waiting on certificate issuance from order default/letsencrypt-staging-1-1540553651: "pending"
  Events:
    Type    Reason              Age   From                                                Message
    ----    ------              ----  ----                                                -------
    Normal  WaitingForApproval  82s   cert-manager-certificaterequests-issuer-selfsigned  Not signing CertificateRequest until it is Approved
    Normal  cert-manager.io     82s   cert-manager-certificaterequests-approver           Certificate request has been approved by cert-manager.io
    Normal  OrderCreated        82s   cert-manager-certificaterequests-issuer-acme        Created Order resource default/letsencrypt-staging-1-1540553651
    Normal  OrderPending        82s   cert-manager-certificaterequests-issuer-acme        Waiting on certificate issuance from order default/letsencrypt-staging-1-1540553651: ""
Order:
  Name: letsencrypt-staging-1-1540553651
  State: pending, Reason: 
  Authorizations:
    URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/13354115553, Identifier: mydomain.com, Initial State: pending, Wildcard: false
Challenges:
- Name: frontend-1-1704429219-393895271, Type: HTTP-01, Token: sT0ajJcPwoTOEFTTJWz7DT1661EIs8xmXBnJ3GaXaZ4, Key: sT0ajJcPwoTOEFTTJWz7DT1661EIs8xmXBnJ3GaXaZ4.YYKJ8tF-wcsUB4dD6zbSv6uD-42JdJyJTlSLSHTQkEs, State: pending, Reason: Waiting for HTTP-01 challenge propagation: wrong status code '400', expected '200', Processing: true, Presented: true

I have tried uninstalling cert-manager and reisntalling it. (earlier I was on cert-manager 1.15.1 and then reinstalled it to 1.14.7.

I could reach to domain from internet and could reach domain.com/.well-known/acme-challenge/ very well and getting token with expected_key displayed properly.

I am completely clueless here. What can be done?

1 Like

I'd check the logs for those 400 codes.

2 Likes

Where exactly I should look for it? I am kind of new to let's encrypt thingy.

1 Like

Those logs would be from the web server.

3 Likes

Welcome to the Let's Encrypt Community! :slightly_smiling_face:

Do you have outbound access configured properly for your k8s cluster? It appears that cert-manager can't reach the token for its precheck, not Let's Encrypt.

5 Likes

Thanks @griffin .

I don't think there is anything that needs to be configured for outbound access. It just works by default.

Looks like cert-manager creates -
1 pod, 1 service and 1 ingress for getting ssl cert. These are all removed when the cert is issued by letsE.

I could see below on in logs for the pod of cert-manager

I0729 11:28:27.038034       1 solver.go:51] "starting listener" logger="cert-manager.acmesolver" expected_domain="domain.gov" expected_token="qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ" expected_key="qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ.s6oXfLamoYacTpjjrD9iaZQPOlP9FeuwKmJuht-B99I" listen_port=8089
I0729 11:31:09.937740       1 solver.go:76] "validating request" logger="cert-manager.acmesolver" host="domain.gov" path="/.well-known/acme-challenge/qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ" base_path="/.well-known/acme-challenge" token="qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ"
I0729 11:31:09.937784       1 solver.go:84] "comparing host" logger="cert-manager.acmesolver" host="domain.gov" path="/.well-known/acme-challenge/qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ" base_path="/.well-known/acme-challenge" token="qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ" expected_host="domain.gov"
I0729 11:31:09.937805       1 solver.go:91] "comparing token" logger="cert-manager.acmesolver" host="domain.gov" path="/.well-known/acme-challenge/qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ" base_path="/.well-known/acme-challenge" token="qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ" expected_token="qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ"
I0729 11:31:09.937819       1 solver.go:99] "got successful challenge request, writing key" logger="cert-manager.acmesolver" host="domain.gov" path="/.well-known/acme-challenge/qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ" base_path="/.well-known/acme-challenge" token="qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ"

Also, I have checked the nginx controller pods logs (using ingress-nginx here) and it's not showing much other than

W0729 15:17:50.340741       7 controller.go:1435] Error getting SSL certificate "default/frontend": local SSL certificate default/frontend was not found. Using default certificate
I0729 15:17:50.359490       7 main.go:107] "successfully validated configuration, accepting" ingress="default/cm-acme-http-solver-c4pxw"
I0729 15:17:50.387569       7 store.go:440] "Found valid IngressClass" ingress="default/cm-acme-http-solver-c4pxw" ingressclass="nginx"
W0729 15:17:50.387634       7 controller.go:1435] Error getting SSL certificate "default/frontend": local SSL certificate default/frontend was not found. Using default certificate

So after waiting for like 15 mins...I could see Timeout exceeded while awaiting headers in challenge when I descrive it using kubectl.

Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://mydomain.gov.bt/.well-known/acme-challenge/qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ': Get "http://mydomain.gov.bt/.well-known/acme-challenge/qYXoPFzeHDWlLMlDKUhM40vEL04H9eAO8uf3mdbOHuQ": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

But it was back to wrong status code '400', expected '200' after few mins again.

2 Likes

I wouldn't be so sure. :slightly_smiling_face:

Here's the evidence (emphasis mine):

If cert-manager can't reach the challenge file it is serving, it won't bother telling Let's Encrypt to try to do so, even if Let's Encrypt would succeed. This is one of those frustrating things with working with k8s clusters (and corporate networking in general).

Essentially the trouble you're seeing here is that cert-manager can't find an outbound route back to its own challenge file. It doesn't need to leave the local network. It just needs to connect.

5 Likes

That is expected since your ingress controller is being configured to serve HTTPS and doesn't have a valid certificate to serve yet. Using a default/"snakeoil" certificate is better than crashing. :wink:

5 Likes

@griffin

So it was DNS. LOL. The client was using seperate DNS for LAN machines than the DNS set for domain.

For internet - website.gov was pointing to publuc IP x.x.x.70 IP. Same domain on internal LAN DNS was going to pvt.ip.84.96. Even after removal on domain from internal DNS, things were not working.

Evnetually I ended up restarting pods for - coreDNS, cert-manager and metallb and nignx-ingress. And it was working

I really appreciate your valuable inputs.Thanks a ton

4 Likes

Happy to help :slightly_smiling_face: and glad it's working. :partying_face:

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.