DNS-01, TTL and deactivating authorizations


#1

Hi,

here’s a curiosity I stumbled upon today when doing the following:

  1. request a challenge
  2. trigger DNS-01 challenge
  3. provide DNS record with 10s TTL
  4. request certificate
  5. deactivate authorization
  6. before TTL expires, request a new challenge and trigger it
  7. server will complain that the DNS data doesn’t match, because it has the old record cached

Is this intentional? Shouldn’t the cache be cleared when deactivating an authorization? Should DNS-01 requests be cached at all? What is the purpose of caching a one-time challenge-response?

I have lowered my TTL to 1s now (I’m hesitant to use 0s).

Any comments?


#2

Interesting. I can’t reproduce that. For me it works (although I’m using a 300s TTL ) but would have thought that only made things worse.

Mine was

  1. request a challenge
  2. Trigger DNS-01 challenge
  3. provide DNS TXT record with 300s TTL
  4. request certificate ( and obtain it)
  5. deactivate authorization
  6. immediately request a challenge
  7. Trigger DNS-01 challenge
  8. provide DNS TXT record with 300s TTL
  9. request certificate ( and obtain it )

Silly question, are you sure it’s being cached at the LE side, and not just your DNS servers not providing the correct response yet ?


#3

I’m using my own piece of Perl code to answer the requests and I have confirmed (with tcpdump also) that no request is being made the second time.

However, this is only against the staging server. Are you using that, too? Are you using the same domain name each time?


#4

Yes, staging server and same domain name ( actually 2, mytestdomain and www.mytestdomain ) I have managed to get a couple of fails - most work fine though


#5

Can you reproduce it if you stay under 10s reliably? Maybe the server uses a max TTL. I never had any success with a 10s TTL and 2 successive triggers in a row. It reliably fails with

"error" => {
  "detail" => "Correct value not found for DNS challenge",
  "status" => 403,
  "type" => "urn:acme:error:unauthorized"
},

every time.


#6

I can get the error if I go for very short timescales. it doesn’t seem to be the TTL though, I can get the same error message if I just re-request in a very short period ( less than 10 seconds )

If I’m slower ( such as below) I’m always successful

2016-11-15 18:16:29 Registering account
2016-11-15 18:16:31 Verify each domain
2016-11-15 18:16:31 Verifying mytestdomain.com
2016-11-15 18:16:34 Verifying www.mytestdomain.com
2016-11-15 18:16:37 checking DNS at mimi.ns.cloudflare.com for www.mytestdomain.com. Attempt 1/100 gave wrong result,  waiting 5 secs before checking again
2016-11-15 18:16:44 Verified mytestdomain.com
2016-11-15 18:16:48 Verified www.mytestdomain.com
2016-11-15 18:16:49 Verification completed, obtaining certificate.
2016-11-15 18:16:51 Certificate saved in /home/andy/.getssl/mytestdomain.com/mytestdomain.com.crt
2016-11-15 18:16:52 The intermediate CA cert is in /home/andy/.getssl/mytestdomain.com/chain.crt
2016-11-15 18:16:53 deactivating domain mytestdomain.com
2016-11-15 18:16:55 deactivating domain www.mytestdomain.com
getssl: mytestdomain.com - certificate obtained but certificate on server is different from the new certificate

$ getssl mytestdomain.com -f
2016-11-15 18:17:42 Registering account
2016-11-15 18:17:44 Verify each domain
2016-11-15 18:17:44 Verifying mytestdomain.com
2016-11-15 18:17:46 Verifying www.mytestdomain.com
2016-11-15 18:17:49 checking DNS at mimi.ns.cloudflare.com for mytestdomain.com. Attempt 1/100 gave wrong result,  waiting 5 secs before checking again
2016-11-15 18:17:54 checking DNS at mimi.ns.cloudflare.com for mytestdomain.com. Attempt 2/100 gave wrong result,  waiting 5 secs before checking again
2016-11-15 18:18:02 Verified mytestdomain.com
2016-11-15 18:18:06 Verified www.mytestdomain.com
2016-11-15 18:18:08 Verification completed, obtaining certificate.
2016-11-15 18:18:10 Certificate saved in /home/andy/.getssl/mytestdomain.com/mytestdomain.com.crt
2016-11-15 18:18:10 The intermediate CA cert is in /home/andy/.getssl/mytestdomain.com/chain.crt
2016-11-15 18:18:11 deactivating domain mytestdomain.com
2016-11-15 18:18:14 deactivating domain www.mytestdomain.com
getssl: mytestdomain.com - certificate obtained but certificate on server is different from the new certificate

Mine is slightly slower anyway - since I’m having to wait for cloudflare DNS servers to update (hence the 5 second pauses ). I’m using a TTL of 300 seconds though - hence why I don’t think it’s TTL related.

If I try and complete the second request within 10 seconds ( and cloudflare servers have responded quickly and providing the correct result), then I do get the same error as you.


#7

I was under the impression that Let’s Encrypt`s unbound instance doesn’t do any caching, but it’s possible there’s a short minimum TTL (maybe 60s?), which is often used as a defense-in-depth measure against rebinding attacks (boulder generally pins IPs once they’re resolved, but there’s always the chance you forget that somewhere).

Due to #2326 it’s hard to say whether the issue here is boulder actually getting the wrong TXT record or no record at all, but the fact that tcpdump doesn’t show any requests suggests it’s the former (due to caching).


#8

With a 1s TTL, it reliably succeeds, so the resolver on LE’s end must be obeying it.


#9

That’s interesting. The only explanation I can think of (if we assume these observations are correct) would be a maximum TTL of something like 10 seconds. Not sure why that value would be used though.


#10

Found this comment suggesting the max TTL is 5 minutes:

Slightly confused about why things worked with a 300s TTL in that case. Maybe there are multiple resolvers that don’t share their cache and you got lucky, or maybe the value has been lowered since.


#11

I was puzzled why it works with a much higher TTL, but I suspect this is because of different resolvers? Anyway, it doesn’t really change anything, since I can reliably reproduce the “issue”.

My main point is, if a new challenge comes into existence for a name that previously had a challenge, it should flush all caches for that name. This would solve any race conditions regardless of the TTL used.

It isn’t really an issue anyway because I can just use a very low TTL which LE seems to obey. On that note, is it safe to use a TTL of 0? The standards say 0 should mean “never cache”, but is this something that’s on your radar and is it safe to rely on? Or should I use 1s?

Edit: While I don’t expect to handle multiple challenges for the same name in a short time when switching to production, I think it all should work correctly in any case.


#12

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.