I have got brand new on prem k8s 1.30.2. It has HA enabled via kube-vip. 3 master & 3 worker nodes.
I have installed cert-manager 1.14.7 via plain manifest file.
I have got clusterissuer set to letse-staging. Deployment, service & ingress are created. I could access said application via domain on http.
Once certificate request is generated by cert-manager, I could see orders and challanges are created. But they are remaining in pending state forever.
Upon using cmctl status certificate
Conditions:
Approved: True, Reason: cert-manager.io, Message: Certificate request has been approved by cert-manager.io
Ready: False, Reason: Pending, Message: Waiting on certificate issuance from order default/letsencrypt-staging-1-1540553651: "pending"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitingForApproval 82s cert-manager-certificaterequests-issuer-selfsigned Not signing CertificateRequest until it is Approved
Normal cert-manager.io 82s cert-manager-certificaterequests-approver Certificate request has been approved by cert-manager.io
Normal OrderCreated 82s cert-manager-certificaterequests-issuer-acme Created Order resource default/letsencrypt-staging-1-1540553651
Normal OrderPending 82s cert-manager-certificaterequests-issuer-acme Waiting on certificate issuance from order default/letsencrypt-staging-1-1540553651: ""
Order:
Name: letsencrypt-staging-1-1540553651
State: pending, Reason:
Authorizations:
URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/13354115553, Identifier: mydomain.com, Initial State: pending, Wildcard: false
Challenges:
- Name: frontend-1-1704429219-393895271, Type: HTTP-01, Token: sT0ajJcPwoTOEFTTJWz7DT1661EIs8xmXBnJ3GaXaZ4, Key: sT0ajJcPwoTOEFTTJWz7DT1661EIs8xmXBnJ3GaXaZ4.YYKJ8tF-wcsUB4dD6zbSv6uD-42JdJyJTlSLSHTQkEs, State: pending, Reason: Waiting for HTTP-01 challenge propagation: wrong status code '400', expected '200', Processing: true, Presented: true
I have tried uninstalling cert-manager and reisntalling it. (earlier I was on cert-manager 1.15.1 and then reinstalled it to 1.14.7.
I could reach to domain from internet and could reach domain.com/.well-known/acme-challenge/ very well and getting token with expected_key displayed properly.
Do you have outbound access configured properly for your k8s cluster? It appears that cert-manager can't reach the token for its precheck, not Let's Encrypt.
If cert-manager can't reach the challenge file it is serving, it won't bother telling Let's Encrypt to try to do so, even if Let's Encrypt would succeed. This is one of those frustrating things with working with k8s clusters (and corporate networking in general).
Essentially the trouble you're seeing here is that cert-manager can't find an outbound route back to its own challenge file. It doesn't need to leave the local network. It just needs to connect.
That is expected since your ingress controller is being configured to serve HTTPS and doesn't have a valid certificate to serve yet. Using a default/"snakeoil" certificate is better than crashing.
So it was DNS. LOL. The client was using seperate DNS for LAN machines than the DNS set for domain.
For internet - website.gov was pointing to publuc IP x.x.x.70 IP. Same domain on internal LAN DNS was going to pvt.ip.84.96. Even after removal on domain from internal DNS, things were not working.
Evnetually I ended up restarting pods for - coreDNS, cert-manager and metallb and nignx-ingress. And it was working
I really appreciate your valuable inputs.Thanks a ton