I was working on an Automated Java client to issue LetsEncrypt Certificates. I am using AWS Route53 as my DNS Provider.
My code adds TXT record to AWS Route53 and once it is successfully added and verified that TXT DNS Record it accessible and responding, it waits for 10 Seconds and process with Verification but I get NXDOMAIN error from LetsEncrypt.
I ran record nslookup also and got a successful response:
nslookup -query=txt _acme-challenge.www.shieldblaze.com 8.8.8.8
Server: dns.google
Address: 8.8.8.8
Non-authoritative answer:
_acme-challenge.www.shieldblaze.com text =
"21WU0OsnfSMObo0MeZgZY0GhthOqFmru8_DVZg_DaXI"
Then I decided to change 10 Seconds Wait Time to 90 Seconds. After that, It worked. I ran the test a few times and confirmed LetsEncrypt was taking more time that other DNS Client to verify TXT records.
Can anyone from LetsEncrypt confirm the DNS Update Time Issue?
How do you verify that Route 53’s DNS servers are serving the new records? If you just send some DNS queries, it’s possible that some of their servers have the new records and some don’t.
Edit: Let’s Encrypt’s resolvers cache with a very low maximum TTL. If you try to validate twice, very quickly, it’s possible that a cached NXDOMAIN from the first failed validation attempt would also cause the second to fail. But you’re probably not moving that quickly.
My code only takes Wait Time (10 Seconds earlier and 90 Seconds now) after Route53 Status changes to INSYNC (successful record propagation). I even confirmed with tools like DNSMap for record propagation. Everything was normal but LetsEncrypt was taking more time to update than expected.
I tested multiple subdomains so previous failed NXDOMAIN shouldn’t happen.
Ah! That’s strange. INSYNC is supposed to mean it’s in sync and you shouldn’t have to wait any additional time at all…
I’m sorry I can’t provide much help.
This isn’t guaranteed, but FYI, I believe Let’s Encrypt’s DNS resolvers currently cache for no longer than 60 seconds. If you wait at least that long between validation attempts, there should be no possible problems caused by caching on Let’s Encrypt’s side.
Maybe there’s a bug in your code and it isn’t waiting as long as it’s supposed to, or there’s a bug with Route 53 and it’s returning INSYNC when it’s not in sync, or there’s a problem at Let’s Encrypt.
I’m low on ideas.
Can you post your code, or logs from your client?
Do you have Route 53 configured to log DNS queries to CloudWatch?
You could temporarily lower your zone’s negative TTL.
For what it’s worth, I’m not having issues with Route 53 and the staging environment.
The client’s strategy is to poll GetChange using a randomized exponential backoff and have Let’s Encrypt perform the validation as soon as it goes INSYNC.