We're running dehydrated to manage our LetsEncrypt certs. Everything's working fine but just one cert request, which always returns:
DNS problem: NXDOMAIN looking up TXT for _acme-challenge.ws.domain - check that a DNS record exists for this domain
We're asking for a wildcard cert (*.ws.domain), with an "alias" as ws.domain to make the request possible.
However, we've been trying for a couple days and the answer is always the same, although I checked the DNS from an external website and it always seem to resolve.
I'm not sure where the problem is. Once we add the TXT record, we wait about 15 minutes to allow propagation, which is enough for the other records.
That's what is returned when checking the challenge:
Responding to challenge for ws.domain authorization...
Cleaning challenge tokens...
[2020-11-18 12:05:59] Cleaning up TXT record (_acme-challenge.ws.domain)
[2020-11-18 12:05:59] INFO => DNS TXT record DELETED: _acme-challenge.ws.domain
ERROR: Challenge is invalid! (returned: invalid) (result: {
"type": "dns-01",
"status": "invalid",
"error": {
"type": "urn:ietf:params:acme:error:dns",
"detail": "DNS problem: NXDOMAIN looking up TXT for _acme-challenge.ws.domain - check that a DNS record exists for this domain",
"status": 400
},
"url": "https://acme-v02.api.letsencrypt.org/acme/chall-v3/8690265661/M4TlHw",
"token": "Hk9uj5ObMdb416jbw1QwP2aniKuuxJNNnotg0cYR5sU"
})
Challenge validation has failed
I munged the real domain so it's not indexed in search engines, but it can be seen in the "url" parameter of the answer if you open the url.
Note: Currently the TXT DNS record doesn't exist, as the hook scripts cleans it out when the answer from LetsEncrypt is issued (either if it's ok or not).
Is this something on our side? I'd be very grateful if someone could shed some light on this!
Without seeing what TXT record is being created firsthand, I can only offer general advice, some of which might seem obvious, but I give said advice because I've seen related issues hundreds of times.
Make sure you're creating a TXT record and not some other type of record
Be careful of relative versus absolute hosts/names for records (for example: _acme-challenge.ws.domain can end up becoming _acme-challenge.ws.domain.domain., so you might need to use _acme-challenge.ws or _acme-challenge.ws.domain. instead. The trailing period makes the host/name absolute.)
Ensure long enough propagation time (15 minutes is usually excessive, but it depends on the DNS servers)
Make certain the value (right-hand side) of the TXT record is the long base64 string given by dehydrated
Most of the time when people ask for second-level wildcards (*.ws.domain), they actually want first-level wildcards (*.domain).
I'm absolutely sure we're creating a TXT record. We use the same script for about a hundred of TLS certs and it works for all of them (except for the one I'm asking for help).
I double-checked and the host is an absolute hostname (no extra domain. is appended). The TXT record is for _acme-challenge.ws.domain.
I even tried extending the propagation time to 1 hour, however, the same error shows up. I was thinking this could also be a cache failure, but I'm not sure if DNS servers also cache negative responses (in case the first time I requested the cert Ididn't leave enough time)?
I checked the string given by dehydrated is the right one. However, I guess that's not the error because it would say that the challenge is not correct, instead of the NXDOMAIN error.
The second-levels wildcard is actually for us We're deploying a kubernetes infrastructure and we plan to add hosts like .ws.domain, so that's actually what we want.
Anyway, thanks for the help! I hope this clears some of the basic debugging of the problem.
I just added the TXT DNS record so you can test (I launched the dehydrated script but removed the part where it cleans the challenges and DNS record, so if it fails, the DNS record will be kept).
Furthermore...
Only two of the four have IPv6 addresses.
And you can guess which two those are.
Yup, the two that fail to resolve the zone.
So, as LE tends to prefer IPv6 (and things that prefer IPv6).
This becomes (not 50% pass/50% fail as seen on IPv4) 100% fail via IPv6.