Cert order fails with error "During secondary validation: Incorrect TXT record"

Hello,

During ordering of certificates from Lets Encrypt staging, we observe intermittent error statuses when polling the DNS challenge status from Lets Encrypt.

Error: During secondary validation: Incorrect TXT record. The TXT record that Let’s Encrypt finds is the one used for a dry run challenge that we run ourselves just before Let’s Encrypt challenge.

Does Error “During secondary validation” indicates that an initial LE validation was successful, and a second validation from a different LE server failed?
How can we mitigate such failure?

My domain is: 1922018.dev.e2e.certificate-manager.test.cloud.ibm.com

I ran this command: Not using a command, using web app with a node.js ACME client

It produced this output:
POST “https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/39582182/XxGGYg”. Status is: 200. Response body is {“type”:“dns-01”,“status”:“invalid”,“error”:{“type”:“urn:ietf:params:acme:error:unauthorized”,“detail”:“During secondary validation: Incorrect TXT record “_iBvYMTpy0Mtcbh38V3MxvVEoTzNs_dXjswJ6ZTpIBA” found at _acme-challenge.1922018.dev.e2e.certificate-manager.test.cloud.ibm.com”,“status”:403},“url”:“https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/39582182/XxGGYg",“token”:“tnXonkeMkRSFtGl90g5uYgpQTajtYvoJYBsfhuG2pc4”,“validationRecord”:[{“hostname”:"1922018.dev.e2e.certificate-manager.test.cloud.ibm.com”}]}

{“type”:“dns-01”,“status”:“invalid”,“error”:{“type”:“urn:ietf:params:acme:error:unauthorized”,“detail”:“During secondary validation: Incorrect TXT record “_iBvYMTpy0Mtcbh38V3MxvVEoTzNs_dXjswJ6ZTpIBA” found at _acme-challenge.1922018.dev.e2e.certificate-manager.test.cloud.ibm.com”,“status”:403},“url”:“https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/39582182/XxGGYg",“token”:“tnXonkeMkRSFtGl90g5uYgpQTajtYvoJYBsfhuG2pc4”,“validationRecord”:[{“hostname”:"1922018.dev.e2e.certificate-manager.test.cloud.ibm.com”}]}

My web server is (include version): N/A

The operating system my web server runs on is (include version): N/A

My hosting provider, if applicable, is: softlayer.com

I can login to a root shell on my machine (yes or no, or I don’t know): N/A

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): N/A

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): N/A

Hi,

I think this is because of Let’s Encrypt’s mutli-VA validation.
Let’s Encrypt might found an old DNS query when attempting to validate your record (might be from a different server).

The best way to mitigate might be waiting longer before attempting to submit / validate order to give more time for DNS propergration.

Thank you

2 Likes

Thank you stevenzhu. After setting the TXT challenge record, we first verify ourselves that queryTxt using the Authoritative name server ips can resolve the TXT record. Then we wait for another 20 seconds before asking Let Encrypt to do the validation. However it is still failing occasionally.

Honestly I haven’t experience this error myself, but I know there are some people setting the sleep timer to more than 1 minute to avoid such issue.
I hope @cpu can explain more from Let’s Encrypt’s perspective.

1 Like

Hi @ArikS,

Yes, in this case the primary validation request succeeded but the requests from US east and US west failed, finding the wrong key authorization.

Do you host the authoritative nameservers yourself or are you using a third party provider? Are your authoritative nameservers 1:1 with the IP addresses or are you perhaps using a setup with IP load balancing or anycast routing? Is there any additional caching in front of the authoritative zones within your infrastructure? What TTL do you set on the TXT records?

Hi @cpu,

Do you host the authoritative nameservers yourself or are you using a third party provider - We do not host the nameservers ourselves, we are using a third party provider. We are doing a queryNS to obtain the nameservers ips and then use them when performing the DNS resolution for the TXT token.

Are your authoritative nameservers 1:1 with the IP addresses or are you perhaps using a setup with IP load balancing or anycast routing? Is there any additional caching in front of the authoritative zones within your infrastructure? - It’s a 3rd party, I don’t have visibility to this information.

What TTL do you set on the TXT records - 2 minutes

Can you explain more what you mean by a queryNS? Is that an API operation with your provider?

Understood. Often in this case the provider will have an API that can be used to ask it “Have all of your nameservers synchronized the zone information?” and you would want to use that in this case (vs. trying to query them over DNS yourself). Is something like that available?

We set a max TTL of 60 in our resolvers so in this case that would be the operating TTL. I recommend you try setting a TTL of 0 (or at least, as close as allowed with your provider) and see if that helps.

1 Like

No, our offering is a web service. Our users can order certificates for any DNS provider as long as it offers an API for adding TXT records.

queryNS means that our node.js app is performing a dns.resolveNs request to resolve the name server records for the host name.

Not available in our case as we are not targeting a single provider.

We set a max TTL of 60 in our resolvers so in this case that would be the operating TTL. I recommend you try setting a TTL of 0 (or at least, as close as allowed with your provider) and see if that helps.
[/quote]

Thanks @cpu, will try reducing TTL and see if helps.

2 Likes

To emphasize what @cpu said: In our experience, many DNS providers operate a fleet of authoritative servers that all answer to the same IP address (i.e. anycast). In that situation, even if you get a response from one authoritative server that has the correct TXT record, there may be another authoritative server in another region that doesn’t have that TXT record yet. Notably, Route53 does offer an API that tells you when a record is available in all regions, but that appears to be rare among DNS providers.

This means that increasing your sleep time (20 seconds) is probably necessary. Most client integrations that use the DNS challenge with providers other than Route53 have a sleep time closer to 10 minutes.

Also it’s worth noting that changing the TTL on your records only affects how long the Let’s Encrypt nameserver caches an incorrect response, which is different from the sleep time. And if Let’s Encrypt requests _acme-challenge.example.com and gets an empty response, the TTL governing that empty response will be the one indicated in the SOA record, not the TTL that you set when adding TXT records (because at the time Let’s Encrypt received the response, there was no TXT record). The long and the short of it is: You are better off changing your sleep time than your TTL.

2 Likes

I wrote up a longer discussion of the issues at During secondary validation: Incorrect TXT record. Thanks for posting about this, @ArikS! Hopefully the extra explanation will help a bunch of people.

4 Likes

Thanks @jsha, @cpu for the clarifications. Applying sleep time closer to 10 minutes will be a huge set back for our users experience. Our customers are using our service to provision new web resources protected with Let’s Encrypt certificates on demand. Today it takes about a minute for the resource to get provisioned and ready for operation, with the end-user waiting to get started.

The ACME RFC describes Retrying Challenges. Can we send a retry request to Let’s Encrypt in this case (after an additional sleep)?

@jsha, you also mentioned that acme-dns fixes this problem. Can you explain how it fixes this problem?

1 Like

I can try to explain: acme-dns will start a small DNS server to only server validation requests (you’ll need to point _acme-challenge subdomain to the generated end-domain). Since there’s one single server only serving validation token, there will not have query result difference or sync (between server) delays.

2 Likes

@stevenzhu had a great summary of how it works! You might also be interested to read our blog post about ACME onboarding with DNS-01: https://letsencrypt.org/2019/10/09/onboarding-your-customers-with-lets-encrypt-and-acme.html. And the acme-dns documentation itself also describes the issue: https://github.com/joohoi/acme-dns#acme-dns.

We don’t yet offer challenge retries, sorry!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.