We provide a service that orders certificates from LE.
We use dns-01 challenge.
We use Lego library.
A customer creates a config with his DNS provider credentials and we make everything for him.
We add the TXT record through DNS provider API, we check that the record is there (polling with intervals), then we call LE to validate a challenge.
Usually everything works, we have a lot of customers.
Recently orders started failing for one specific customer.
In our log, I can see that a challenge was set and found by our service, but when we call LE it returns the error "acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: NXDOMAIN looking up TXT for _acme-challenge.logs-epfp01-00465.qradar.ibmcloud.com - check that a DNS record exists for this domain".
We increased the interval to wait between the set challenge and validation to 10 minutes for case of slow propagation. The customer complains that 7 of 15 orders still fail.
Please help to understand what is the issue.
The customer's domain is: qradar.ibmcloud.com
The hosting provider is: Softlayer
We always clean TXT record after an order (succeeded or failed)
Since we don't have any running order right now you can't find any TXT record.
During an order our code (Lego) checks that a TXT record exists and only then calls LE.
I think it's the opposite problem - Lego doesn't find a TXT record when it exists
In our case Lego finds it immediately after our delay of 10 min and LE doesn't find it after that
I'm not sure.
As I said we use Lego and they make DNS lookup for added TXT records.
I see this code:
const defaultResolvConf = "/etc/resolv.conf"
var defaultNameservers = []string{
"google-public-dns-a.google.com:53",
"google-public-dns-b.google.com:53",
}
// recursiveNameservers are used to pre-check DNS propagation.
var recursiveNameservers = getNameservers(defaultResolvConf, defaultNameservers)
"/etc/resolv.conf"
Our server is running in docker container that has base image FROM registry.access.redhat.com/ubi8/go-toolset:1.17.12-11
I'm not sure what this file contains in this image.
The code calculating recursiveNameservers is of Lego, unfortunately I can't add logs there
If we check TXT records manually during an order we can see them (and the customer checked and saw) What do you mean the "wrong" zone? How can we check it?