LE Using Chached DNS lookups during DV process

During the issuance process I ran into an issue verifying control of one of my domains, turns out I had DNS configured wrong, it was pointing at an individual Application Server instead of the load-balancer. The site was operational so I hadn't caught that this site wasn't being distributed by the load-balancer across my network.

I was able to complete the process of issuing a SAN cert for the rest of the domains by leaving that domain out. I modified the DNS records and waited for them to propagate across my host's network of DNS servers (I host @Linode), After a few hours I tried to issue a new cert for the domain that had failed earlier only to see it still failing. I checked the domain's DNS settings and it was correct in my authoritative DNS servers, I also checked using external tools like NWTools.com's DNS query tool which was returning the correct value for the A record. I tried again 24 hours later, and am still getting the error.

Seeing:

Failed authorization procedure. www.copy.mx (simpleHttp): unauthorized :: The client lacks sufficient authorization :: Invalid response from http://www.copy.mx/.well-known/acme-challenge/p67dvPAKbViesuIjoNMuIcIgDT5RT1Uybi1xLo6Who0 [OLD IP ADDRESS]: 404, copy.mx (dvsni): unauthorized :: The client lacks sufficient authorization :: Correct zName not found for TLS SNI challenge

IMPORTANT NOTES:

  • The following 'unauthorized' errors were reported by the server:

Domains: copy.mx, www.copy.mx
Error: The client lacks sufficient authorization

To fix these errors, please make sure that your domain name was
entered correctly and the DNS A record(s) for that domain contains
the right IP address.

So, even though the DNS records had propagated through the internet and traffic for this domain was now hitting my load-balancer. LetsEncrypt's server was still using OLD DNS information (cached?) to attempt to do the Domain Verification.

I switched to the Application server that LetsEncrypt was trying to verify the domain at, grabbed the python client to attempt to verify the domain here, the domain verified perfectly and a certificate was issued.

I do not know if there is a (security) reason why we would cache DNS lookups, intentionally slow down changes to the domains to issue certs?

1 Like

Did you manage to resolve this issue?
I am dealing with excactly the same bug.

I started the veryfication procedure with misconfigurared dns Α record and it seems even though dns is now updated LetsEncrypt’s server is still using old dns data.

I was able to complete the registration at the other IP address as I still controlled it. Not sure if the incorrect value is still cached still, I’ll find out in January when I go to renew that cert.

Update. Domain verification completed. It seems that LetsEncrypt’s server caches dns data for 24 hours.

1 Like

Depending on your TTL, that might mean that the total time will be even longer, as it is quite likely that the name server from which letsencrypt is getting its records had also cached it.

I’m currently roughly 26 hours in, and will report back tomorrow to see what it did.

We’ve decreased the max TTL we use to a few minutes, so this should no longer be an issue.

1 Like

Where is the problem to actually obey the TTL? Why do you need to violate standards?

Edit: This is the same symptom as when you tried to do SMTP by hand and messed up the whole DNS part of it. Just stick to what the Internet agreed on and don’t try to do your own thing. The Internet is smarter.

@TCM: The issue is that folks would have the wrong entry in their DNS, would fix it, and then would be stuck until the TTL expired.

It definitely makes sense for certain types of services to overwrite minimum and/or maximum TTL. The former provides a good mitigation against DNS rebind attacks, while the latter increases UX for services that rely on DNS for verification purposes. As long as you know why you’re doing it, there’s nothing wrong with that.

This makes no sense. How does an additional TTL help here? You are actually increasing their problem by increasing their effective TTL. Edit: I misunderstood here that the TTL now has a maximum.

You need to stop trying to accomodate every mistake someone makes in completely unrelated scopes. If someone doesn’t know how to properly setup DNS or migrate a record, it’s not your place to help or fix it. All you do is confuse the people who do know their stuff, as evidenced by this thread.

No, it’s just wrong.

You're misunderstanding: They're setting the maximum TTL to a few minutes, so that upstream changes in DNS are picked up faster. Let's say the first response has a TTL of 24 hours - now you'll have to wait 24 hours till Let's Encrypt picks up the new IP address. With a maximum TTL of a few minutes, you only need to wait a few minutes until Let's Encrypt sees the new address instead. This is a nice UX improvement without any negative side effects.

Can't argue with that. :confused:

That’s not what the OP is describing. He describes LE using the old record way past the expiration of the TTL.

OP ran into this issue almost 2 months ago. @jsha stated this was (probably recently?) changed, likely because of issues like this. As @joostrijneveld stated, DNS propagation of records with a TTL of 24 hours might actually take more than 24 hours, since other DNS servers between Let’s Encrypt and OPs DNS might have cached the record too. Setting a maximum TTL makes even more sense in this context.

I understand my mistake and that the issue was fixed by (again) messing with the TTL.

I’ll stand by it: Don’t mess with DNS or assume you know better.

If you teach admins that it’s OK to mess up your TTLs during a migration, they won’t learn it. You learn best when you get bitten in the ass and everyone who has to deal with DNS professionally has to know about the TTL game during a migration.

It’s not LE’s place to be the nurse here. There are standards and they are there for a reason.

It seems as if the caching problem still persists. I updated my DNS entry more than 59 hours ago (on Feb 14 01:29, now it’s Feb 16 13:10 in my time zone) but LE still uses the old entry. Some more details:

My domain is lenaschimmel.de, the old IP is 83.169.21.157 and the new one is 176.28.22.249

About 10 hours after updating, I confirmed that the update had propageted across several DNS servers across the globe using https://dnschecker.org/#A/lenaschimmel.de

The TTL of my DNS is configured to be 60 minutes.

I use ./letsencrypt-auto certonly --webroot -w /var/www/html -d lenaschimmel.de -d www.lenaschimmel.de and get the result:

Failed authorization procedure. www.lenaschimmel.de (http-01): urn:acme:error:unauthorized :: The client lacks sufficient authorization :: Invalid response from http://www.lenaschimmel.de/.well-known/acme-challenge/cCxLuqyFcq6wi33_SrXigXj_ZsH7QowDCvgW0UPPV4w [83.169.21.157]: 404

IMPORTANT NOTES:
 - The following errors were reported by the server:

   Domain: www.lenaschimmel.de
   Type:   unauthorized
   Detail: Invalid response from http://www.lenaschimmel.de/.well-
   known/acme-challenge/<challenge removed from this post>
   [83.169.21.157]: 404

   To fix these errors, please make sure that your domain name was
   entered correctly and the DNS A record(s) for that domain
   contain(s) the right IP address.

I conclude that the LE server still tries to connect to the old IP.

Please note that I retried this approximately every 10 hours since the DNS update. Could it be that some DNS cache only purges the cached IP if it not requested for more than 24 hours?

It could possibly also be due to your NS setup.

If I check what your authoritative nameservers are, I get a response of ns.namespace4you.de and hostmaster.lenaschimmel.de. Asking them what your domain IP address is, ns.namespace4you.de gives the updated 176.28.22.249 whilst your other authoritative nameserver doesn’t respond with the 176.28.22.249 address. Whilst being unresponsive should mean it’s ignored and your other authoritative nameserver used, I’m just windering if it’s just a coincidence that this is on 83.169.21.157. It may be worth either setting that DNS to respond, or removing it as an authorative nameserver.

hostmaster is not a nameserver. You are looking at the SOA.

The problem is that there’s a disparity between the www A record and the domain’s A record. www is definitely still pointing to 83.169.21.157. Only the domain itself is pointing to 176.28.22.249.

3 posts were split to a new topic: DNS resolution problem during issuance for one domain

From our observation these "few minutes ttls" are still a problem.

  1. issuing cert for c1.abc.mydomain.com - that works nicely
  2. issuing cert for c2.abc.mydomain.com like 10 seconds later - that fails because letsencrypt claims there are no acme records in abc.mydomain.com zone while they are, freshly added

Few minutes later - that 2) works.

Issuing service should be reliable operation and do not depend on caching. It isn't unfortunately which is bad.