TXT record is not found

We provide a service that orders certificates from LE.
We use dns-01 challenge.
We use Lego library.
A customer creates a config with his DNS provider credentials and we make everything for him.
We add the TXT record through DNS provider API, we check that the record is there (polling with intervals), then we call LE to validate a challenge.
Usually everything works, we have a lot of customers.
Recently orders started failing for one specific customer.
In our log, I can see that a challenge was set and found by our service, but when we call LE it returns the error "acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: NXDOMAIN looking up TXT for _acme-challenge.logs-epfp01-00465.qradar.ibmcloud.com - check that a DNS record exists for this domain".
We increased the interval to wait between the set challenge and validation to 10 minutes for case of slow propagation. The customer complains that 7 of 15 orders still fail.
Please help to understand what is the issue.

The customer's domain is: qradar.ibmcloud.com
The hosting provider is: Softlayer

2 Likes

Hello @TatyanaBol, welcome to the Let's Encrypt community. :slightly_smiling_face:

Using this online tool https://dnsspy.io/ does not show any DNS Records for _acme-challenge.qradar.ibmcloud.com given qradar.ibmcloud.com as the input; results here (look to the bottom) DNS Spy report for qradar.ibmcloud.com

Also using nslookup I do not find any DNS TXT Record for _acme-challenge.qradar.ibmcloud.com

$ nslookup
> qradar.ibmcloud.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
*** Can't find qradar.ibmcloud.com: No answer
> set q=soa
> qradar.ibmcloud.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Non-authoritative answer:
qradar.ibmcloud.com
        origin = ns1.softlayer.com
        mail addr = root.qradar.ibmcloud.com
        serial = 2023010309
        refresh = 7200
        retry = 600
        expire = 1728000
        minimum = 900

Authoritative answers can be found from:
> server ns1.softlayer.com.
Default server: ns1.softlayer.com.
Address: 67.228.254.4#53
> qradar.ibmcloud.com
Server:         ns1.softlayer.com.
Address:        67.228.254.4#53

qradar.ibmcloud.com
        origin = ns1.softlayer.com
        mail addr = root.qradar.ibmcloud.com
        serial = 2023010309
        refresh = 7200
        retry = 600
        expire = 1728000
        minimum = 900
> set q=a
> qradar.ibmcloud.com
Server:         ns1.softlayer.com.
Address:        67.228.254.4#53

Name:   qradar.ibmcloud.com
Address: 172.16.0.1
> set q=aaaa
> qradar.ibmcloud.com
Server:         ns1.softlayer.com.
Address:        67.228.254.4#53

*** Can't find qradar.ibmcloud.com: No answer
> set q=cname
> qradar.ibmcloud.com
Server:         ns1.softlayer.com.
Address:        67.228.254.4#53

*** Can't find qradar.ibmcloud.com: No answer
> set q=txt
> qradar.ibmcloud.com
Server:         ns1.softlayer.com.
Address:        67.228.254.4#53

*** Can't find qradar.ibmcloud.com: No answer
> _acme-challenge.qradar.ibmcloud.com
Server:         ns1.softlayer.com.
Address:        67.228.254.4#53

** server can't find _acme-challenge.qradar.ibmcloud.com: NXDOMAIN
> exit

1 Like

We always clean TXT record after an order (succeeded or failed)
Since we don't have any running order right now you can't find any TXT record.
During an order our code (Lego) checks that a TXT record exists and only then calls LE.

1 Like

However the TTL from the DNS SOA is 900s which is 15 minutes.

1 Like

what do you recommend to set in TTL?
Usually we use ttl 2 mins
We can try to set it to 15 min, can it help?

1 Like

@TatyanaBol Kindly wait for more knowledgeable Let's Encrypt community volunteers to assist.

I doubt the TTL time is of much concern.
This step might be of more interest:

Do you poll both authoritative DNS servers?

5 Likes

Supplemental information from nslookup, the Authoritative DNS Name Servers are:

> set q=ns
> qradar.ibmcloud.com
Server:         ns1.softlayer.com.
Address:        67.228.254.4#53

qradar.ibmcloud.com     nameserver = ns2.softlayer.com.
qradar.ibmcloud.com     nameserver = ns1.softlayer.com.

1 Like

It's not our code, It's Lego.
I think this is the code lego/nameserver.go at master · go-acme/lego · GitHub

I do see several GitHub Issues Issues · go-acme/lego · GitHub
This one might be related Dynu dns-challenge fails creating a certificates for subdomains · Issue #1672 · go-acme/lego · GitHub

1 Like

Thank you for your help!
Why do you think this issue is relevant for our case?

2 Likes

DNS-Challenge fails.

Also this issue look possibly of interest acme-dns: CNAME and DNS Zones cause dns-01 challenge to fail due to bad propagation check · Issue #1710 · go-acme/lego · GitHub
Thinking that because of ". . . dns-01 challenge to fail due to bad propagation check "

I am essentially grabbing at straws, just checking out remote possibilities.

Also what version of Lego are you using?

1 Like

I think it's the opposite problem - Lego doesn't find a TXT record when it exists
In our case Lego finds it immediately after our delay of 10 min and LE doesn't find it after that

Lego 4.8

3 Likes

What are the DNS servers being used by your server?
Do they return the proper authoritative DNS server list?

5 Likes

I'm not sure.
As I said we use Lego and they make DNS lookup for added TXT records.
I see this code:

const defaultResolvConf = "/etc/resolv.conf"
var defaultNameservers = []string{
	"google-public-dns-a.google.com:53",
	"google-public-dns-b.google.com:53",
}
// recursiveNameservers are used to pre-check DNS propagation.
var recursiveNameservers = getNameservers(defaultResolvConf, defaultNameservers)

Are there any logs to show which nameservers are being used by lego?

This seems to include more than those two entries:

4 Likes

no, we don't have logs about nameservers

Please show:
cat /etc/resolv.conf

OR
alter the code [debug] and show the value of: var recursiveNameservers
[after it has been set]

My guess is that there may be some false positive returns.
Due to TXT records being inserted in the "wrong" zone.
But that's pure guesswork...

4 Likes

You could also try:

var recursiveNameservers = getNameservers(defaultNameservers)

4 Likes
  1. "/etc/resolv.conf"
    Our server is running in docker container that has base image
    FROM registry.access.redhat.com/ubi8/go-toolset:1.17.12-11
    I'm not sure what this file contains in this image.

  2. The code calculating recursiveNameservers is of Lego, unfortunately I can't add logs there

  3. If we check TXT records manually during an order we can see them (and the customer checked and saw) What do you mean the "wrong" zone? How can we check it?