Multi-level sub domain support AWS EKS

We are using lets encrypt certificate since long time and it was working fine till recently.
we have created approx 100 + URLs (single level and multilevel sub-domain).
Suddenly it stopped working in last few days while adding the new URLs.

My domain is: Oneenterprise.com

I ran the below command

kubectl get ingress -A ,
The output of this command , shows the ingress entries .
When we browse the URLs , it shows the application working but the SSL certificate is not issued / assigned

It produced this output:

root@CSX-Irfan:/home/irfan# kubectl get ingress -A
NAMESPACE NAME CLASS HOSTS ADDRESS PORTS AGE
cloudcontroller oe-cloud-controller nginx-lc000 cloudcontroller.am.dev.oneenterprise.com ae3d4e78f646f4bf78aaaf198fefa38e-899486150.us-east-1.elb.amazonaws.com 80, 443 6d21h
cloudcontroller oe-cloud-controller-qa nginx-lc000 cloudcontroller.am.qa.oneenterprise.com ae3d4e78f646f4bf78aaaf198fefa38e-899486150.us-east-1.elb.amazonaws.com 80, 443 6d21h

My web server is (include version):
the environment is , AWS EKS cluster , version , 1.28
we are using ngnix ingress controller and all entries are added using yaml files .

The operating system my web server runs on is (include version):
we are using ubuntu OS

My hosting provider, if applicable, is:
Lets encrypt certificate . using cert manager . from bitnami helm package manager

I can login to a root shell on my machine (yes or no, or I don't know): Yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
No, we don't use control panel . Instead we use kubectl command line tools to manage the certificates and secrets .

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):

certbot 1.21.0

1 Like

In the public cert logs I see many certs of subdomains of oneenterprise.com issued every day including today. Do you think all of them are failing or just some?

Can you explain more details of problem? Do you have a log or more details of an error message?

Those are two different ACME Clients. Can you explain why you would be using both? If you are using cert-manager you normally would not be using Certbot also.

3 Likes

Thanks @MikeMcQ

Here are the more logs for the particular URL : https://magboeh10.am.dev.oneenterprise.com/

failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
W0307 14:49:45.383350 1 reflector.go:539] k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
E0307 14:49:45.383390 1 reflector.go:147] k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: Failed to watch *v1.Challenge: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
W0307 14:50:32.643856 1 reflector.go:539] k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
E0307 14:50:32.643909 1 reflector.go:147] k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: Failed to watch *v1.Challenge: failed to list *v1.Challenge: the server could not find the requested resource (get challenges.acme.cert-manager.io)
E0307 14:50:45.720043 1 controller.go:167] "re-queuing item due to error processing" err="the server could not find the requested resource (post challenges.acme.cert-manager.io)" logger="cert-manager.orders" key="lc001/zim.oneenterprise.com-tls-1-977685139"
E0307 14:51:06.607069 1 controller.go:167] "re-queuing item due to error processing" err="the server could not find the requested resource (post challenges.acme.cert-manager.io)" logger="cert-manager.orders" key="lc001/magboeh10.am.dev.oneenterprise.com-tls-1-2010739064"

Im able to browse the application but without HTTPS
Further , we are using cert-manager and not the certbot. Sorry for the confusion

further , i followed below links and tried but no luck

please suggest

Those errors look like cert-manager trying to validate the acme challenge before it makes the cert request to the Let's Encrypt server. If the error was from the LE Server connecting to your domain the error messages would be very different. This points to a configuration problem in your setup. I am not a K8s expert so will not be able to help you debug that. Maybe someone else here will or try a different forum.

I don't see that you ever got a cert for magboeh10.am.dev.oneenterprise.com

I see you got a cert for your papad subdomain today. That looks to be using AWS ELB too. Are you sure mabgoeh10.am.dev is configured the same as your other ones that are working?

Here is a debug resource for cert-manager

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.