DNS-01, TTL and deactivating authorizations

TCM · November 15, 2016, 5:13pm

Hi,

here’s a curiosity I stumbled upon today when doing the following:

request a challenge
trigger DNS-01 challenge
provide DNS record with 10s TTL
request certificate
deactivate authorization
before TTL expires, request a new challenge and trigger it
server will complain that the DNS data doesn’t match, because it has the old record cached

Is this intentional? Shouldn’t the cache be cleared when deactivating an authorization? Should DNS-01 requests be cached at all? What is the purpose of caching a one-time challenge-response?

I have lowered my TTL to 1s now (I’m hesitant to use 0s).

Any comments?

serverco · November 15, 2016, 5:30pm

Interesting. I can’t reproduce that. For me it works (although I’m using a 300s TTL ) but would have thought that only made things worse.

Mine was

request a challenge
Trigger DNS-01 challenge
provide DNS TXT record with 300s TTL
request certificate ( and obtain it)
deactivate authorization
immediately request a challenge
Trigger DNS-01 challenge
provide DNS TXT record with 300s TTL
request certificate ( and obtain it )

Silly question, are you sure it’s being cached at the LE side, and not just your DNS servers not providing the correct response yet ?

TCM · November 15, 2016, 5:56pm

I’m using my own piece of Perl code to answer the requests and I have confirmed (with tcpdump also) that no request is being made the second time.

However, this is only against the staging server. Are you using that, too? Are you using the same domain name each time?

serverco · November 15, 2016, 6:03pm

Yes, staging server and same domain name ( actually 2, mytestdomain and www.mytestdomain ) I have managed to get a couple of fails - most work fine though

TCM · November 15, 2016, 6:07pm

Can you reproduce it if you stay under 10s reliably? Maybe the server uses a max TTL. I never had any success with a 10s TTL and 2 successive triggers in a row. It reliably fails with

"error" => {
  "detail" => "Correct value not found for DNS challenge",
  "status" => 403,
  "type" => "urn:acme:error:unauthorized"
},

every time.

serverco · November 15, 2016, 6:21pm

I can get the error if I go for very short timescales. it doesn’t seem to be the TTL though, I can get the same error message if I just re-request in a very short period ( less than 10 seconds )

If I’m slower ( such as below) I’m always successful

2016-11-15 18:16:29 Registering account
2016-11-15 18:16:31 Verify each domain
2016-11-15 18:16:31 Verifying mytestdomain.com
2016-11-15 18:16:34 Verifying www.mytestdomain.com
2016-11-15 18:16:37 checking DNS at mimi.ns.cloudflare.com for www.mytestdomain.com. Attempt 1/100 gave wrong result,  waiting 5 secs before checking again
2016-11-15 18:16:44 Verified mytestdomain.com
2016-11-15 18:16:48 Verified www.mytestdomain.com
2016-11-15 18:16:49 Verification completed, obtaining certificate.
2016-11-15 18:16:51 Certificate saved in /home/andy/.getssl/mytestdomain.com/mytestdomain.com.crt
2016-11-15 18:16:52 The intermediate CA cert is in /home/andy/.getssl/mytestdomain.com/chain.crt
2016-11-15 18:16:53 deactivating domain mytestdomain.com
2016-11-15 18:16:55 deactivating domain www.mytestdomain.com
getssl: mytestdomain.com - certificate obtained but certificate on server is different from the new certificate

$ getssl mytestdomain.com -f
2016-11-15 18:17:42 Registering account
2016-11-15 18:17:44 Verify each domain
2016-11-15 18:17:44 Verifying mytestdomain.com
2016-11-15 18:17:46 Verifying www.mytestdomain.com
2016-11-15 18:17:49 checking DNS at mimi.ns.cloudflare.com for mytestdomain.com. Attempt 1/100 gave wrong result,  waiting 5 secs before checking again
2016-11-15 18:17:54 checking DNS at mimi.ns.cloudflare.com for mytestdomain.com. Attempt 2/100 gave wrong result,  waiting 5 secs before checking again
2016-11-15 18:18:02 Verified mytestdomain.com
2016-11-15 18:18:06 Verified www.mytestdomain.com
2016-11-15 18:18:08 Verification completed, obtaining certificate.
2016-11-15 18:18:10 Certificate saved in /home/andy/.getssl/mytestdomain.com/mytestdomain.com.crt
2016-11-15 18:18:10 The intermediate CA cert is in /home/andy/.getssl/mytestdomain.com/chain.crt
2016-11-15 18:18:11 deactivating domain mytestdomain.com
2016-11-15 18:18:14 deactivating domain www.mytestdomain.com
getssl: mytestdomain.com - certificate obtained but certificate on server is different from the new certificate

Mine is slightly slower anyway - since I’m having to wait for cloudflare DNS servers to update (hence the 5 second pauses ). I’m using a TTL of 300 seconds though - hence why I don’t think it’s TTL related.

If I try and complete the second request within 10 seconds ( and cloudflare servers have responded quickly and providing the correct result), then I do get the same error as you.

pfg · November 15, 2016, 6:25pm

I was under the impression that Let’s Encrypt`s unbound instance doesn’t do any caching, but it’s possible there’s a short minimum TTL (maybe 60s?), which is often used as a defense-in-depth measure against rebinding attacks (boulder generally pins IPs once they’re resolved, but there’s always the chance you forget that somewhere).

Due to #2326 it’s hard to say whether the issue here is boulder actually getting the wrong TXT record or no record at all, but the fact that tcpdump doesn’t show any requests suggests it’s the former (due to caching).

TCM · November 15, 2016, 6:29pm

With a 1s TTL, it reliably succeeds, so the resolver on LE’s end must be obeying it.

pfg · November 15, 2016, 6:38pm

That’s interesting. The only explanation I can think of (if we assume these observations are correct) would be a maximum TTL of something like 10 seconds. Not sure why that value would be used though.

pfg · November 15, 2016, 6:45pm

Found this comment suggesting the max TTL is 5 minutes:

Slightly confused about why things worked with a 300s TTL in that case. Maybe there are multiple resolvers that don’t share their cache and you got lucky, or maybe the value has been lowered since.

TCM · November 15, 2016, 7:01pm

I was puzzled why it works with a much higher TTL, but I suspect this is because of different resolvers? Anyway, it doesn’t really change anything, since I can reliably reproduce the “issue”.

My main point is, if a new challenge comes into existence for a name that previously had a challenge, it should flush all caches for that name. This would solve any race conditions regardless of the TTL used.

It isn’t really an issue anyway because I can just use a very low TTL which LE seems to obey. On that note, is it safe to use a TTL of 0? The standards say 0 should mean “never cache”, but is this something that’s on your radar and is it safe to rely on? Or should I use 1s?

Edit: While I don’t expect to handle multiple challenges for the same name in a short time when switching to production, I think it all should work correctly in any case.

system · December 15, 2016, 7:01pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LE Using Chached DNS lookups during DV process Issuance Tech	24	11756	March 26, 2021
Concurrent issuances with DNS-01 challenge Issuance Tech	14	161	May 10, 2025
DNS01 validation timeouts Client dev	3	1324	April 19, 2020
Soliciting feedback on shortening authorization lifetimes to 7 hours Issuance Policy	54	3268	February 11, 2023
Dns-01 use cached reply from own letsencrypt ns Help	16	2005	July 2, 2020

DNS-01, TTL and deactivating authorizations

Related topics