DNS-01 problem with dehydrated

I have the same problem.

I use DNSSEC.
I wrote a hook for dehydrated with debugging notes.
In the example below, you can see:

  1. the tokens provided by Letsencrypt, to be used in the TXT record;
  2. the record added to the DNS, with the original token;
  3. the test on our master DNS, returning the record above;
  4. the propagation of the record to both Cloudflare and Google;
  5. Letsencrypt responding that the record is not correct!

[example.com]

token 1 = CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM
token 2 = zFD6–UX3XYoGMpppLocbvxbYGCTo7SqoCqcptmfi-8

+ Adding the following to the zone definition of [example.com]:
_acme-challenge.[example.com]. 300 IN TXT “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”
+ Updating the zone…
+ Signing the zone…
+ Checking the RR on the live DNS… OK
“CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”

192.168.1.6 (master): “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”
1.1.1.1 (Cloudflare): “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”
8.8.8.8 (google): “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”

[*.example.com]

token 1 = Rflnf-GHKaZWuclGLf92LL8jkMKgpSvLxFIwGcUun1g
token 2 = 6aJSDNn-GBlqUOXjdm8NZSxL6PKFT3pRTOhCRRi4Lp0

+ Adding the following to the zone definition of [example.com]:
_acme-challenge.[example.com]. 300 IN TXT “Rflnf-GHKaZWuclGLf92LL8jkMKgpSvLxFIwGcUun1g”
+ Updating the zone…
+ Signing the zone…

+ Checking the RR on the live DNS… OK
“CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”
“Rflnf-GHKaZWuclGLf92LL8jkMKgpSvLxFIwGcUun1g”

192.168.1.6 (master): “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”
“Rflnf-GHKaZWuclGLf92LL8jkMKgpSvLxFIwGcUun1g”
1.1.1.1 (Cloudflare): “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”
8.8.8.8 (google): “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”

Letsencrypt

+ Responding to challenge for [example.com] authorization…
+ ERROR: invalid challenge for *.[example.com]

CA server response:
{
“type”: “dns-01”,
“status”: “invalid”,
“error”: {
“type”: “urn:ietf:params:acme:error:unauthorized”,
“detail”: “Incorrect TXT record “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM” (and 1 more) found at _acme-challenge.[example.com]”,
“status”: 403
},
“url”: “https://acme-v02.api.letsencrypt.org/acme/chall-v3/3342861489/8qsaTQ”,
“token”: “CaxlSTmwudKMcVH9R_-X0DTJWYdVRV0b7dPZiGGtAeM”
}

Summary

I have production domains with expired certificates, and cannot renew.

1 Like

I moved your post to a new thread as it’s a separate issue.

So, from the look of things, you are taking the token from the challenge resource, and using it as the value of your TXT record.

This is not how the token is used.

For the DNS-01 challenge (https://tools.ietf.org/html/rfc8555#section-8.4), you:

  1. Take the challenge token
  2. Derive the key authorization value using (1)
  3. Take the SHA-256 digest of the value from (2)
  4. Take the base64url encoding of the value from (3)
  5. Set your TXT record to the value from (4)

Generally, when you use an ACME client like Certbot or dehydrated, the client will give you the final value you need, saving you the trouble of steps 1-4.

Looking at https://github.com/dehydrated-io/dehydrated/blob/master/docs/dns-verification.md ,

$3 is a “challenge token” (which is not needed for dns-01), and
$4 is a token which needs to be inserted in a TXT record for the domain.

It sounds like you are using $3, but need to be using $4.

2 Likes

You are right, my fault.

This is the new log for the wildcard case. The token is $4 now. The TTL is down to 30 sec, and the DNS tests are done for 10 seconds after the TTL. Google is slow to pick it up, but Cloudflare is spot on. As you can see, Letsencrypt is also slow. It would be useful to have a Letsencrypt diagnostic page, to see the full log from the server side.

Processing example.com with alternative names: *.example.com
+ Signing domains…
+ Generating private key…
+ Generating signing request…
+ Requesting new certificate order from CA…
+ Received 2 authorizations URLs from the CA
+ Handling authorization for example.com
+ Handling authorization for example.com
+ 2 pending challenge(s)
+ Deploying challenge tokens…

fqdn = example.com
token 1 = YdIkxG-2QznRkDUw7t_l-TMHX97ACkZdgXyiX3WCFMc
token 2 = BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI
+ Adding the following to the zone definition of example.com:
_acme-challenge.example.com. 30 IN TXT “BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI”
+ Updating the zone…
+ Signing the zone…
+ Checking the RR on the live DNS… OK
“BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI”
+ sleeping 30sec, to allow the CA to pick it up…

192.168.1.6 (master): “BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI”
1.1.1.1 (Cloudflare): “BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI”
8.8.8.8 (google):

fqdn = example.com
token 1 = NZYR87hqZgfKUJbU2RQICaTpxllciFazXkF0TwotTCo
token 2 = VhgxNC87qDBp9-HcKkATfCUkFb516stf4Mv0CPldM2w
+ Adding the following to the zone definition of example.com:
_acme-challenge.example.com. 30 IN TXT “VhgxNC87qDBp9-HcKkATfCUkFb516stf4Mv0CPldM2w”
+ Updating the zone…
+ Signing the zone…
+ Checking the RR on the live DNS… OK
“BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI”
“VhgxNC87qDBp9-HcKkATfCUkFb516stf4Mv0CPldM2w”
+ sleeping 30sec, to allow the CA to pick it up…

192.168.1.6 (master): “BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI”
“VhgxNC87qDBp9-HcKkATfCUkFb516stf4Mv0CPldM2w”
1.1.1.1 (Cloudflare): “VhgxNC87qDBp9-HcKkATfCUkFb516stf4Mv0CPldM2w”
“BXbg0mcLMzdubtX3FoC-OhsEqYCIcu-d2J0f9q4pQqI”
8.8.8.8 (google):

+ Responding to challenge for example.com authorization…
+ ERROR: invalid challenge for *.example.com

CA server response:
{
“type”: “dns-01”,
“status”: “invalid”,
“error”: {
“type”: “urn:ietf:params:acme:error:dns”,
“detail”: “DNS problem: NXDOMAIN looking up TXT for _acme-challenge.example.com - check that a DNS record exists for this domain”,
“status”: 400
},
“url”: “https://acme-v02.api.letsencrypt.org/acme/chall-v3/3351771990/XI1JWQ”,
“token”: “YdIkxG-2QznRkDUw7t_l-TMHX97ACkZdgXyiX3WCFMc”
}

On Letsencrypt DNS

Is it possible to tell LE to read the token directly from the master, instead of the slaves or third party DNSs?. We use DNSSEC with DANE, each zone signature resets the SOA serial and it takes time for the slaves to pick it up.

1 Like

Let’s Encrypt queries your authoritative nameservers directly, it has a very negligible recursive resolver cache (60s, or your TTL, whichever is lower).

What seems likely is that one of your slaves was not yet serving the updated zone. That would also be consistent with Cloudflare picking it up and Google not - it’s just luck about which of your nameservers they hit.

Let’s Encrypt also tends to expose nameserver desynchronizations more often than common recursive resolvers, due to (under some circumstances) comparing responses between nameservers.

1 Like

Our slaves are slow. Reading from the master is the only way to get past the verification. However, LE fails to read the master, as you can see from the log. The log shows the LAN address. The query from the public IP of the master is in sync. I raised the waiting time to 2x the TTL (30 sec), without joy.

1 Like

You don’t know what nameserver Let’s Encrypt’s resolver is taking its decision from. For all you know, it is checking all 3 and taking a quorum decision.

Anyway,

This isn’t an option. SOA MNAME is not used as any kind of hint by recursive resolvers - only for dynamic DNS updates.

You need to wait for your slaves to update before responding to the challenge, or pull your slaves.

1 Like

This isn’t an option.

LE should always prefer the master (SOA MNAME), especially when its records are signed (DANE).

On resolving, a simple “dig @$master +dnssec +short -tTXT _acme-challenge.$fqdn” would do, with no need to wait for the dns global databases to pick up LE’s temporary RRs.

Do you use NSD? If so these may be worth a try


12h ; refresh
2h ; retry
2w ; expire
1h ; min TTL

These are my RFC sane setting:

1200 ; SOA Refresh: slaves must refresh (learn zone changes) after 1200–43200 seconds
7200 ; SOA Retry: slaves must retry contacting master up to 120-7200 seconds
604800 ; SOA Expire: slaves must revalidate after 604800–1209600 seconds
3600 ; SOA Minimum: slaves must flush negative responses after 3600–86400 seconds

I prefer my 20min to your 12h refresh.

I still find it unreasonable for LE to force me to wait SOA Refresh + some, especially because you are doing it twice, for the fqdn and for the wildcard.

1 Like

If not SOA Expire…

You have Retry > Refresh, is it on purpose?

1 Like

I am within the RFC timing boundaries.

According to who? (I genuinely don’t know)

According to anybody who knows what DANE is and knows how to query it.

https://tools.ietf.org/html/rfc6698

https://www.huque.com/bin/danecheck
https://www.huque.com/bin/gen_tlsa

1 Like

I am still not connecting the dots on wtf DANE has to do with how DNS recursors perform their queries. As far as I can tell, recursors don’t care, and have never cared about SOA MNAME.

To cite https://tools.ietf.org/html/rfc8499 ,

The idea of a primary master is only used in [RFC1996] and
[RFC2136]. A modern interpretation of the term “primary master”
is a server that is both authoritative for a zone and that gets
its updates to the zone from configuration (such as a master file)
or from UPDATE transactions.

RFC1996 and RFC2136 being DNS NOTIFY and DNS UPDATE, neither relevant for recursors.

You can play with unboundtest.com if you like, it’s the same recursor + similar configuration to what Let’s Encrypt use for their VA - lots of verbose logging.

1 Like

Why is LE using a dns recursor when a simple “dig @$master +dnssec +short -tTXT _acme-challenge.$fqdn” would do, with no need to wait for the dns global databases to pick up LE’s temporary RRs?

With DANE, both ports 25 and 443 are signed in the DNS using a hash of their respective TLS certificates, who happen to be those you are updating from LE. LE could be smarter in this case, with no need for temporary acme RRs.

You presume that the master is accessible from the Internet - that is NOT a requirement.
It only needs to be accessible to the slaves.

Your plan puts “all (DNS) eggs in one (MASTER) basket”.
And would require the DNS resolver to do a series of “if then else” logic tests/steps.
[You are essentially rewriting DNS]

2 Likes

https://www.cloudflare.com/learning/dns/what-is-dns/ .

How does Let’s Encrypt even discover what your primary server is (in the hypothetical world where that means something)? It does it by descending from the DNS root zone (.), until it eventually finds your SOA record.

That’s why you use a recursor. It does the chain of lookups for you.

You have assigned meaning to SOA MNAME that simply doesn’t exist. Your three nameservers are completely equivalent to each other (in terms of priority and authoritativeness), to every recursor on the internet. You need to deal with the slave lag by sleeping.

This is what people who use e.g. Linode DNS hosting do - they literally put 20 minute sleeps into their renewal scripts, because Linode’s slave lag is so bad.

2 Likes

Or try speeding that up with change notifications.

From a top-down recursive view, all authoritative nameserver are provided by the level above via DNS Glue records [which are all created equally].
There is no “Super Glue” record that points to the Master.

1 Like