DNS authorization fails randomly

So I have sporadic failures when renewing my certificate by DNS challenge. Normally I have around 14 SANs in the cert, many of them wildcards. Some of them fail almost every time I try to renew the cert. I picked one domain (sites.karotte.org) for demonstration and created a new cert only with sites.karotte.org and *.sites.karotte.org in the SAN.

The setup is special as the _acme-challenge record is redirected via CNAME to a separate subdomain that can be updated dynamically. in this case:

_acme-challenge.sites.karotte.org. IN CNAME sites.karotte.org._acme_challenges.challenges.karotte.org.

I have a manual auth hook that sets the TXT records (needed b/c certbot DNS update would not work with CNAME redirects).

I ran this command (this comes out of a Makefile):

/opt/letsencrypt/bin/certbot.sh --non-interactive --agree-tos --email me@example.com \
        --dry-run \
        --config-dir /opt/letsencrypt/certbot/conf \
        --logs-dir /opt/letsencrypt/certbot/log \
        --work-dir /opt/letsencrypt/certbot/work \
        certonly \
        --csr testdomain/testdomain-1606834179.csr \
        --cert-path testdomain/cert-1606834179.crt \
        --chain-path testdomain/intermediate-1606834179.pem \
        --fullchain-path testdomain/chained-1606834179.pem \
        --server https://acme-staging-v02.api.letsencrypt.org/directory \
        --manual \
        --manual-public-ip-logging-ok \
        --preferred-challenges dns \
        --manual-auth-hook "/opt/letsencrypt/bin/dns-challenge.py auth" \
        --manual-cleanup-hook "/opt/letsencrypt/bin/dns-challenge.py cleanup"

It produced this output:

Saving debug log to /opt/letsencrypt/certbot/log/letsencrypt.log
Plugins selected: Authenticator manual, Installer None
Performing the following challenges:
dns-01 challenge for sites.karotte.org
dns-01 challenge for sites.karotte.org
Running manual-auth-hook command: /opt/letsencrypt/bin/dns-challenge.py auth
Output from manual-auth-hook command dns-challenge.py:
Verified sites.karotte.org._acme_challenges.challenges.karotte.org TXT D3TB-k4Ew9vBx2VwtltCyoVFS0d5h7xDQiHsKI8FMyk (1 records)

Running manual-auth-hook command: /opt/letsencrypt/bin/dns-challenge.py auth
Output from manual-auth-hook command dns-challenge.py:
Verified sites.karotte.org._acme_challenges.challenges.karotte.org TXT 9b9n0URSsQibpCJwVVdw81A6N0uB28W0ft36kqRU-MQ (2 records)

Waiting for verification...
Challenge failed for domain sites.karotte.org
Challenge failed for domain sites.karotte.org
dns-01 challenge for sites.karotte.org
dns-01 challenge for sites.karotte.org
Cleaning up challenges
Running manual-cleanup-hook command: /opt/letsencrypt/bin/dns-challenge.py cleanup
Running manual-cleanup-hook command: /opt/letsencrypt/bin/dns-challenge.py cleanup
Some challenges have failed.

IMPORTANT NOTES:
 - The following errors were reported by the server:

   Domain: sites.karotte.org
   Type:   dns
   Detail: During secondary validation: DNS problem: NXDOMAIN looking
   up TXT for _acme-challenge.sites.karotte.org - check that a DNS
   record exists for this domain

   Domain: sites.karotte.org
   Type:   dns
   Detail: During secondary validation: DNS problem: NXDOMAIN looking
   up TXT for _acme-challenge.sites.karotte.org - check that a DNS
   record exists for this domain

The version of my client is: 1.9.0

The error changes, in this case it is "During secondary validation", sometimes it is:

IMPORTANT NOTES:
 - The following errors were reported by the server:

   Domain: sites.karotte.org
   Type:   dns
   Detail: DNS problem: NXDOMAIN looking up TXT for
   _acme-challenge.sites.karotte.org - check that a DNS record exists
   for this domain

   Domain: sites.karotte.org
   Type:   dns
   Detail: DNS problem: NXDOMAIN looking up TXT for
   _acme-challenge.sites.karotte.org - check that a DNS record exists
   for this domain

Sometimes it just works. So whatever the problem it is transient but I can reproduce it almost every time the script runs.

Sniffing DNS traffic I see a lot of different DNS servers requesting the challenges (or records that are connected to the challenge). I see no NXDomain replies from my server so I assume the problem is somewhere farther away. Maybe some of the DNS servers LE uses have problems with the CNAME? It's unfortunate that the IP of the DNS server doing the validation is not logged in the error. That would narrow the problem down.

Any idea what to do?

I would guess that transient errors in secondary validation indicate that the propagation of the DNS records simply hasn't completed yet to however many global DNS servers your provider uses. The simplest solution is just to wait longer between when your records are written and you try to validate the authorizations.

1 Like

I'm my own DNS provider and specifically have only one DNS Server for the challenge domain. :slight_smile: The auth hook updates that zone so it is live immediately. The zone has a negative TTL of 1 second so caching of negativ responses is not a problem as well.

Apologies. Looking closer at your output and doing a little dig'ing, it looks like you're using a CNAME on the main sites.karotte.org name to sites.karotte.org._acme_challenges.challenges.karotte.org. And it looks like you've delegated challenges.karotte.org to ns1-v4.karotte.org.

The only weirdness I see is that the SOA record on challenges.karotte.org points back to ns1.karotte.org instead of ns1-v4.karotte.org as the primary nameserver. But that shouldn't affect the TXT record lookups I wouldn't think.

You sure there's no delay between when your script publishes the record and it's actually available to query? Like not even a few seconds?

1 Like

No there is no delay. The script updates the zone and even verifies it can resolve the record correctly before it continues.

Then unfortunately, I'm out of ideas.

I wonder if one second is still too long? We respect TTLs on our side, but it's possible we're caching our TXT record lookup for sites.karotte.org and then - if your client is fast enough, and making the second validation request within one second - going back to that cached value, and not seeing the record you added for the new challenge. Adding a very brief sleep to your manual auth hook might do the trick.

what DNS server are you using?

As I see it certbot runs all the authentication hooks one after another (Verified with -vvv). My auth hook script puts both TXT records (one for domain, one for wildcard) in the DNS, then verifies them itself (it is querying a DNS resolver and sees if it can resolve the correct record).

Even if I put a 10 second stop after the entry is made (and verified) by my auth hook script it fails. Just tried it right now, with those two results:

 - The following errors were reported by the server:

   Domain: sites.karotte.org
   Type:   dns
   Detail: DNS problem: NXDOMAIN looking up TXT for
   _acme-challenge.sites.karotte.org - check that a DNS record exists
   for this domain

   Domain: sites.karotte.org
   Type:   dns
   Detail: DNS problem: NXDOMAIN looking up TXT for
   _acme-challenge.sites.karotte.org - check that a DNS record exists
   for this domain

and

 - The following errors were reported by the server:

   Domain: sites.karotte.org
   Type:   dns
   Detail: During secondary validation: DNS problem: NXDOMAIN looking
   up TXT for _acme-challenge.sites.karotte.org - check that a DNS
   record exists for this domain

   Domain: sites.karotte.org
   Type:   dns
   Detail: During secondary validation: DNS problem: NXDOMAIN looking
   up TXT for _acme-challenge.sites.karotte.org - check that a DNS
   record exists for this domain

Third try it worked fine.

@jvanasco I'm using BIND9 (9.16) as auth DNS for the challenges domain and unbound as a DNS resolver for the verification of records my auth hook does.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.