Hi, We are managing thounsands of LE certificates and since past 24th we are retrieving a lot of errors "time exceeded" related with DNS-01 challenge. HTTP challenges seems works well.
I have verified that the _acme-challenge token in internet services to check propagation is being done correctly. However, even after a few minutes, the LE validation still gives me that error. I suspect that we have some of your IPs blocked, but it's very difficult for me to identify whether they belong to Let's Encrypt or not. How can I perform this validation to ensure that I don't have any Let's Encrypt IP blocked?
On the other hand, are the servers you use for DNS-01 challenge the same as those used for HTTP challenge?
Error log sample:
last error: NS ns4.cdmondns-01.org. did not return the expected TXT record [fqdn: domain.tld., value: XXXXX]: v=spf1 include:_spf.google.com ~all\n[domain.tld] time limit exceeded: last error: NS ns4.cdmondns-01.org. did not return the expected TXT record [fqdn: domain.tld., value: XXXXX]: v=spf1 include:_spf.google.com ~all\n\
hello, sorry, we found the problem, it was related to a cache in the DNS balancers, disabling this cache has stopped the errors from occurring. The curious thing is that no matter how much we verified the new DNS records worldwide, the results were correct. It's as if this type of cache in the balancers only acts at the source IP level affecting Let's Encrypt servers but not globally, which leads me to a similar point as before: there is no way to define specific rules for Let's Encrypt, right?. Thank you anyway, and sorry for the inconvenience.
Let's Encrypt is not involved in this error message at all. It's lego doing a preflight DNS query check before it triggers the challenge at Let's Encrypt.
You can disable (or otherwise configure) the preflight check using lego's CLI options. Check here in the lego documentation.
Ok, mmm but I have limited the preflight to 2 ns servers., and errors are reported with ns queries to all ns servers (we have 6 different ns entrypoints). The " NS ns4.cdmondns-01.org. did not return the expected TXT record" is not raw letsencrypt?.
I will continue investigating to accurately determine when the wait occurs within the preflight or during the Let's Encrypt waiting period. At the moment, I cannot assert anything definitively either. Thank you very much for the help regardless.
Nope. That's the lego preflight. Let's Encrypt's error messages will never refer to a specific resolver. They run their own with a very low TTL cache (1-60 seconds).