During secondary validation: Incorrect TXT record

If you run into the error “During secondary validation: Incorrect TXT record” for your DNS-01 validations, here’s some information that may help. The short version is: Your ACME client might need to wait longer between when it configures your DNS records and when it tells Let’s Encrypt to validate the result.

We recently launched multiple viewpoint validation to improve security. This may cause some DNS-01 configurations that previously failed a small portion of the time to fail a larger portion of the time.

When you add a TXT record at your DNS provider, it takes some time before your DNS provider has copied that record to all of their servers. Sometimes this can take a minute, sometimes 10 minutes, occasionally longer, depending on the provider. If your ACME client submits a validation request before the TXT record for the DNS-01 challenge is ready at all your DNS provider’s servers, Let’s Encrypt may get an NXDOMAIN response, which will cause validation to fail. It depends on whether Let’s Encrypt happens to hit one of the servers that already has the TXT record or not.

Now that we do multiple lookups, from multiple vantage points, it’s more likely that we’ll hit one of your DNS provider’s servers that hasn’t updated yet.

Unfortunately, most DNS providers don’t provide an API that lets you check whether all servers are updated (Route53 is a notable exception). Your ACME client just needs to sleep a fairly generous amount of time between adding the TXT record and requesting validation.

Some integrators query their DNS providers’ authoritative name servers to see when the TXT record is fully propagated. This may work in some cases, but it’s not reliable in the general case because of anycast. If your authoritative name server is, say, 198.51.100.12, and you query that IP address, your query might get routed to a different physical server depending on whether you are on the West Coast of the US, or northern Europe, or Asia. So getting a good response from your local 198.51.100.12 doesn’t mean that everyone in the world will see the same thing from their local 198.51.100.12.

How long should your ACME client wait? It depends on your DNS provider. Certbot has some examples for various providers, with many defaulting to 30 seconds, and some going as high as 1200. If you’re using Certbot, you can adjust the wait time with --dns-<plugin>-propagation-seconds (see the --help all output for details).

One last note, on TTLs: TTLs are a different concept than the propagation time I describe above. Adjusting the TTL on your records affects how long recursive resolvers (like the ones Let’s Encrypt runs) can cache a result from your DNS. Our recursive resolvers are configured with a maximum TTL of 60 seconds, so it’s pretty rare to see issues that you could fix by adjusting your TTL. Also, the TTL for records that don’t yet exist is not a property of those records (they don’t exist yet), but is a property of the SOA record for your domain. Short version: adjusting TTL probably won’t help, but adjusting your client’s sleep time probably will.

PS: If you want to spend some extra time and effort, you may find that acme-dns fixes the above problem, and possibly other problems you may have with DNS validation.

12 Likes

Thanks for the information. Users of Certify The Web are starting to see this issue now (some of our DNS providers have configurable propagation delay but regrettably not all, that will obviously get fixed).

I also wondered what the trigger for the email ‘Action required: New feature and your Let’s Encrypt integration’ is? Does it happen as soon as you try to renew and hit a validation error? I’m seeing a spike in support requests and just trying to understand why each person is seeing this. Is it possible some DNS providers actively block one of the IP blocks Let’s Encrypt is using?

As an aside, please don’t make any more major changes to the API behaviour if you can help it for a few months - my users are starting to get ‘Let’s Encrypt’ fatigue, after all it’s supposed to be set it up and forget about it and this (plus the V1> V2 migration) is contributing to more than one person’s high blood pressure (mine at least!).

Yes, people should keep their software up to date, but they don’t.

This is one big batch email we sent out to everyone who, in the logs from the last two (and a bit) months, had a validation success from our primary perspective but a validation failure from one of our secondary perspective.

There’s a separate email going out on an ongoing basis for ACMEv1 deprecation. Every two weeks we are sending an email to everyone who issued an ACMEv1 certificate in the prior two weeks. The subject is “Update your client software to continue using Let’s Encrypt”.

I’m very sorry to contribute to your high support burden and high blood pressure! We definitely try to minimize disruption as much as we can, and in this specific instance we had a tough choice to make: contact everyone who might be affected by multi-perspective validation, even though some would have no problems, or keep support burdens low by not emailing people and letting them ask for help as they ran into problems. We opted for the notification path, and I still think it’s the right choice, but it’s really valuable to hear from maintainers about the impacts it has had, so thank you.

I do think it’s worth reiterating that since the beginning, we have emphasized that part of “set it and forget it” is keeping your software up to date. Auto-updating software is, in my opinion, the next step in ensuring that people’s sites continue to work smoothly without much intervention.

And by the way, I want to say a big thank you for your work! There are not many Windows ACME clients out there, particularly ones that integrate directly with IIS, so your work is extremely valuable. I hope the knowledge that you are helping lots of people can help somewhat during these stressful times.

1 Like

Thanks Jacob, much appreciated. I have a motivational advantage in that approx 2-3% of my users are paying customers (some people want support, some just want the software to continue to exist) which enables me to spend a lot of time developing (and especially supporting) it at the expense of other things. Of course, many/most other maintainers don’t have that.

I fully understand the reasoning behind the changes. I think the real reason this is a sudden issue for me is that my users spend zero time using staging, so I need more automated staging tests.

As another aside, could LE offer a paid service for isolated+managed cloud acme-dns instances, say $5 a month? Writing/maintaining DNS providers is a royal pain (and they’re basically dangerous). It would be a revenue stream and frankly, I was going to do it but I’m just too busy :slight_smile:

1 Like

We’ve talked about ways we could introduce something similar to acme-dns to make things easier, but ran into some opposition within the PKI community, and different opinions on to what extent that would be compliant with the various requirements. We may pick up the process again at some point, but if you’d like to offer that service of course feel free!

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.