One cert failing to renew - don't know why

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is:
*.libraries.tufts.edu

I ran this command:
certbot renew --cert-name libraries.tufts.edu

It produced this output:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Python 3.9 support will be dropped in the next planned release of Certbot - please upgrade your Python version.


Processing /etc/letsencrypt/renewal/libraries.tufts.edu.conf


Renewing an existing certificate for *.libraries.tufts.edu
Waiting 1 seconds for DNS changes to propagate

Certbot failed to authenticate some domains (authenticator: dns-standalone). The Certificate Authority reported these problems:
Domain: libraries.tufts.edu
Type: unauthorized
Detail: No TXT record found at _acme-challenge.libraries.tufts.edu

Hint: The Certificate Authority failed to verify the DNS TXT records created by --dns-standalone. Ensure the above domains are hosted by this DNS provider, or try increasing --dns-standalone-propagation-seconds (currently 1 second).

Failed to renew certificate libraries.tufts.edu with error: Some challenges have failed.


All renewals failed. The following certificates could not be renewed:
/etc/letsencrypt/live/libraries.tufts.edu/fullchain.pem (failure)


1 renew failure(s), 0 parse failure(s)
Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.

My web server is (include version):
n/a. This is dns-standalone

The operating system my web server runs on is (include version):
n/a. dns-standalone

My hosting provider, if applicable, is:
n/a

I can login to a root shell on my machine (yes or no, or I don't know):
yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
certbot 4.2.0

Additional Info:

_acme-challenge.libraries.tufts.edu is a CNAME for acme-challenge.it.tufts.edu
_acme-challenge.perseus.tufts.edu is a CNAME for acme-challenge.it.tufts.edu

acme-challenge.it.tufts.edu is a zone, which is delegated to acme.it.tufts.edu
acme.it.tufts.edu is an A record to 130.64.213.67, which is a NAT device, that allows UDP 53 to the backed server running certbot. So when certbot runs in dns-standalone mode, it starts listening for traffic, and as soon as it handles one request, it closes. So certbot successfully spins up and handles traffic on-demand while obtaining certs, but there is no listener 99.999% of the time.

The *.libraries.tufts.edu cert has been renewed many times over the years, never had a problem until now. Nobody is aware of any changes made to this zone or its records. When I inspect the records, they look correct (as described above).

I can't find anything wrong, and I don't know why this cert is refusing to renew. Other certs (such as the perseus cert) are still successful. So the firewall, NAT, and everything are still correctly passing traffic. It's only this one cert that is failing to renew.

Here is the *.perseus.tufts.edu cert renewal, succeeding:

(venv-certbot) [root@acme-prod-01 ~]# certbot renew --cert-name perseus.tufts.edu --force-renewal
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Python 3.9 support will be dropped in the next planned release of Certbot - please upgrade your Python version.


Processing /etc/letsencrypt/renewal/perseus.tufts.edu.conf


Renewing an existing certificate for *.perseus.tufts.edu and perseus.tufts.edu
Waiting 1 seconds for DNS changes to propagate


Congratulations, all renewals succeeded:
/etc/letsencrypt/live/perseus.tufts.edu/fullchain.pem (success)


Welcome back @rahvee_tufts

That is puzzling. Would you try running this:

certbot renew --cert-name libraries.tufts.edu --dry-run --debug-challenges -v

I am not certain the dns-standalone plugin supports that debug option. It works for HTTP Challenge standalone so hopefully the dns one works similarly.

If it does, Certbot will show the expected TXT record value and will pause giving you a chance to check it from other perspectives. I recommend checking using https://unboundtest.com

It may not work though. It is possible the dns-standalone listener is not yet activated when Certbot shows the value and pauses. So, if you get a timeout rather than a found or not-found response that is probably why.

Otherwise this may be difficult to debug. Do you have any way to see if the inbound request reaches that dns listener? The error from the Let's Encrypt server is "no TXT record" rather than a timeout which indicates something is replying. A blocking firewall or various comms errors should be reported as a timeout instead. Is it possible that some other DNS service is replying to this particular request?

As an aside, the long-term solution may be to use the new DNS-PERSIST-01 challenge. That allows you to place a TXT record once and it persists for multiple (even perpetual) cert requests.

Let's Encrypt has it in its staging system now and hopes for production this quarter. But, I don't know when the EFF plans to include it in Certbot.

This would likely be much simpler than the dns-standalone plugin and related infrastructure.

Just wanted to inform you of this upcoming feature: DNS-PERSIST-01: A New Model for DNS-based Challenge Validation - Let's Encrypt

One possible way forward is to do manual DNS Challenge for this one cert until that becomes viable.

libraries.tufts.edu is a lame delegation, remove the NS records at this domain (not tufts.edu) and that might fix your issue.

Thanks for the suggestions. I was able to run the command, and see an expected value (long random string), and query _acme-challenge.libraries.tufts.edu TXT from unboundtest.com, and see the value. So it worked.

Also, I was able to do a pcap (see attached pcap files) of a successful challenge (*.canvas.tufts.edu) and an unsuccessful challenge (*.libraries.tufts.edu). In both cases, the pcap shows the inbound queries from multiple letsencrypt IP addresses, querying for TXT acme-challenge.it.tufts.edu (they randomize the upper/lower case for some reason), and I see the response packets sending out the challenge strings.

So it appears everything is working from my end. It seems whatever the problem is, it's on letsencrypt's side.

And yet, I get unauthorized for *.libraries.tufts.edu. This failure has repeated each night for several nights, and a few times per day in the past couple of days while I'm trying to debug it.

canvas.pcap (2.6 KB)
libraries.pcap (3.3 KB)

But so is perseus and that's working.

Although, DNSViz report has many more errors for libraries than perseus. I don't have the DNS skills to understand all the implications. Not sure all those reported errors matter for this challenge

libraries: _acme-challenge.libraries.tufts.edu | DNSViz
perseus: _acme-challenge.perseus.tufts.edu | DNSViz

libraries.tufts.edu and perseus.tufts.edu is only served by dns-auth-prod-01.it.tufts.edu and dns-auth-prod-02.it.tufts.edu, however the Akami name servers are listed as authoritative (present in NS records). Because of this resolvers will sometimes try to contact the Akami name servers to find the TXT records.

The correct way to fix this would be to either:

  1. Remove the Akami name servers from the NS record sets for libraries.tufts.edu and perseus.tufts.edu.
  2. Add libraries.tufts.edu and perseus.tufts.edu to the Akami nameservers (maybe merging the DNS zones).

The pcap traces show libraries only replies with 3 TXT records but canvas has 5. Let's Encrypt validates from (currently) 5 worldwide locations. The two missing are one west coast US location and Singapore. The libraries capture supports @MaxHearnden comment that some other DNS system is replying to some of the LE queries.

That also explains why the error is "no TXT record" rather than timeouts, SERVFAIL, and similar possible errors we more likely see with broken DNS configs. This "other" DNS system does not have the expected TXT record (nor should it).