Dns secondary validation fails

My domain is: asp-test.siatel.com

I ran this command: cert-manager deployed in the cluster with ingress configured for http01

It produced this output:
"During secondary validation: DNS problem: server failure at resolver looking up A for asp-test.siatel.com; DNS problem: server failure at resolver looking up AAAA for asp-test.siatel.com"

My web server is (include version):
rke2-ingress-nginx:4.10.502

The operating system my web server runs on is (include version):
Debian 12 / RKE2 Kubernetes v1.31.5+rke2r1 running on all of 3 nodes in self hosted cluster.

My hosting provider, if applicable, is:
presumably SFR (can't be sure, online lookup tools indicate that)

I can login to a root shell on my machine (yes or no, or I don't know):
yes. also I have ssh access to the dns the resolves all my cluster hosted domains.

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
cert-manager v1.17.2

Let'sDebug.net result:
Let's Debug

I have several domains of this type (e.g. grh-dev.siatel.com, cy-dev.siatel.com etc) in the same cluster which have successfully passed secondary validation and obtained their certificate (albeit I have other issues with cert-manager that more times than not it kills the resolver pod/ingress too soon for the letsecnrypt flow to run to completion - the permissions hack allows me to keep him from that just enough for letsencrypt to complete it's validation).

any help/suggestion on how to solve this are welcome.

Secondary validation (checking your challenge response from more than one vantage point) does rely on the same response being presented by all of your nameservers, however there has been a recent uptick in failed secondary validation reports on this forum recently and Let's Encrypt are checking it out. In the meantime I would suggest retrying periodically.

3 Likes

@vcitiriga The problem has been fully repaired now.

There was an unintended component upgrade during a recent migration that caused the problem. Let's Encrypt has automated monitoring of their validation servers but the failure rate from this specific issue wasn't high enough to be detected. They have increased the sensitivity of those monitors to become aware of them sooner. And improved the upgrade mechanism to avoid repeats of that.

Sorry for the disruption. Please let us know if you see any further trouble.

4 Likes

many thanks, secondary validation does not fail anymore on the account of dns issues. it still fails tho' when cert-manager pulls the pods/ingress/challenge from under it's feet before it runs through completion (the secondary validation that is), but that's a different story which I have taken up with cert-manager (Challenge and resolver pod/ingress killed too soon · Issue #7659 · cert-manager/cert-manager · GitHub). still no word on that (to date of this post), any suggestions welcome.

3 Likes