For some reason I cannot renew the LE certificates anymore. It used to work well before. I've checked my domain against letsdebug and other dns checkers - no issues. Please, help me to figure out what's going on.
# whois -h whois.nic.ru abisoft.spb.ru
[Querying whois.nic.ru]
[whois.nic.ru]
domain: ABISOFT.SPB.RU
nserver: ns1.he.net
nserver: ns2.he.net
nserver: ns3.he.net
nserver: ns4.he.net
nserver: ns5.he.net
state: REGISTERED, DELEGATED
# certbot renew --max-log-backups 30
...
Cert is due for renewal, auto-renewing...
Plugins selected: Authenticator manual, Installer None
Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
Renewing an existing certificate for *.abisoft.spb.ru and *.abisoft.biz
Performing the following challenges:
dns-01 challenge for abisoft.spb.ru
Running manual-auth-hook command: /usr/local/bin/update_zone_acme_wrapper.sh
Output from manual-auth-hook command update_zone_acme_wrapper.sh:
updating zone acme for abisoft.spb.ru (fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk)
updating zone acme for abisoft.spb.ru - success, waiting for AXFR ended...
AXFR ended
Waiting for verification...
Challenge failed for domain abisoft.spb.ru
dns-01 challenge for abisoft.spb.ru
Cleaning up challenges
...
# tail -f /var/log/letsencrypt/letsencrypt.log
...
2023-09-13 10:25:24,065:WARNING:certbot._internal.auth_handler:Challenge failed for domain abisoft.spb.ru
2023-09-13 10:25:24,066:INFO:certbot._internal.auth_handler:dns-01 challenge for abisoft.spb.ru
2023-09-13 10:25:24,067:DEBUG:certbot._internal.reporter:Reporting to user: The following errors were reported by the server:
Domain: abisoft.spb.ru
Type: dns
Detail: During secondary validation: DNS problem: query timed out looking up TXT for _acme-challenge.abisoft.spb.ru
2023-09-13 10:25:24,076:DEBUG:certbot._internal.error_handler:Encountered exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/certbot/_internal/auth_handler.py", line 91, in handle_authorizations
self._poll_authorizations(authzrs, max_retries, best_effort)
File "/usr/lib/python2.7/site-packages/certbot/_internal/auth_handler.py", line 180, in _poll_authorizations
raise errors.AuthorizationError('Some challenges have failed.')
AuthorizationError: Some challenges have failed.
...
During certbot work I was double checking from one of our independent servers that the challenge was correctly populated and was able to resolve without an issue:
$ date; for i in $(seq 1 5); do dig +short _acme-challenge.abisoft.spb.ru txt @ns$i.he.net; done
Wed Sep 13 07:24:59 UTC 2023
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
The "During secondary validation" means that Let's Encrypt could connect to your DNS server from some locations but not all.
From my test system I can connect, and am getting NOERROR with no records for a TXT query for _acme-challenge.abisoft.spb.ru (presumably because you're not running certbot right this second).
Can you put a random test TXT value there (or maybe a record with a different name than _acme-challenge if you don't want to interfere with actual certificate acquiring attempts) so that others can try connecting from various places?
Does your DNS provider offer any kind of logs or status dashboard that could help dig into whether it's seeing all the requests coming in from Let's Encrypt's servers? There should be at least 3 for each name.
But, as a paranoid security person, I can also see how that might be too much information to hand out.
Could you pls explain what you mean? In my understanding, it's not a big deal to add the failed NS to the log entry, like: Detail: During secondary validation: DNS problem: query timed out looking up TXT from ns1.he.net for _acme-challenge.abisoft.spb.ru
I don't see any security problems here..
It seems clear to me that any such information can be used to aide nefarious actions [not yours].
The next thing one would ask is: Where did the DNS request come from [which IP exactly]?
These requests may be innocent, but providing such details openly to all can be very bad for business.
I'm not sure if their system really allows for them to say which nameserver failed; it calls out to unbound which I think is checking several of them in parallel. (Though the exact logic it follows is a bit opaque to me.)
@wNRol: Can you please open a new thread in the Help section, and fill out the questionnaire it gives you there as best you can? Might be the same problem, but might not be.
It that is a past tense, then maybe you already solved the problem or have some additional information to add/help us here.
Otherwise, yes, open another topic if you are still having an issue.
@rg305 I don't quite understand whose security might be compromised if you add the failed NS in the LE response. And I'm not asking for the request source IP.
Anyway, since even LE does not know exactly which NS has failed, no reason to discuss it anymore.
Here is the excerpt from the log:
2023-09-13 10:23:36,018:INFO:certbot.compat.misc:Running manual-auth-hook command: /usr/local/bin/update_zone_acme_wrapper.sh
2023-09-13 10:24:53,563:INFO:certbot.compat.misc:Output from manual-auth-hook command update_zone_acme_wrapper.sh:
updating zone acme for abisoft.spb.ru (fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk)
updating zone acme for abisoft.spb.ru - success, waiting for AXFR ended...
AXFR ended
2023-09-13 10:24:53,569:INFO:certbot._internal.auth_handler:Waiting for verification...
...
2023-09-13 10:25:24,065:WARNING:certbot._internal.auth_handler:Challenge failed for domain abisoft.spb.ru
2023-09-13 10:25:24,066:INFO:certbot._internal.auth_handler:dns-01 challenge for abisoft.spb.ru
2023-09-13 10:25:24,067:DEBUG:certbot._internal.reporter:Reporting to user: The following errors were reported by the server:
Domain: abisoft.spb.ru
Type: dns
Detail: During secondary validation: DNS problem: query timed out looking up TXT for _acme-challenge.abisoft.spb.ru
@petercooperjr I can try one more time but this renewal has been failing several times since Sep 12th. The last failure happened ~3 hours ago. So I don't think it's a temporary issue.
My point is information is power.
Be careful who you empower.
And providing unnecessary/optional information is a slippery slope...
You already gave "this", why not give "that" too.
And so on...
And so on...
If you "knew it all", you might be able to resolve your specific problem quicker.
If the "wrong person/state sponsored hacker" "knew it all", we might be left without a free and secure CA system.
If you provide information to one, you should provide it to all.
If you can't/shouldn't provide it to all, then you can't provide it to [any] one.
Note: I don't speak for LE; I'm simply stating my personal paranoid security view.
LE may have its' own reason(s) for not doing that or for not being able to do that.
So this shows it being about 1 minute 17 seconds between starting to run the script to update your DNS zone, and certbot triggering the validation. If I look at the documentation for what I think is your DNS service from Hurricane Electric, two things stand out to me:
It's called an "open beta". The word "beta" has a wide range of meanings in the software world, but it may mean that it's not quite ready for production use in some sense (or at least that they felt the need to add that term as some sort of disclaimer).
Their dynamic TXT update feature says "A propagation delay of up to 5 minutes may be experienced as the TTL of the record will need to expire and refresh. You should wait before requesting DNS01 validation once you have updated the record."
So, echoing what was said earlier,
Is there a configurable delay in that script after it updates the records and before it returns? Or can you add a sleep command or something to it?
Might not fix the problem, but it's something else to try.
@petercooperjr you made several assumptions that are not quite correct.
HE dns system might have "beta" in its name, but it's kinda stable and I specifically added to the original message the part where I check for acme challenge at the same time as LE does that
I don't use their dynamic TTL update feature. HE zones are slave zones, actually. So the
update_zone_acme_wrapper.sh script updates the primary NS server and then waits for the AXFR to HE to complete
I can update the script any way you like, but (in my understanding) the delay is not a problem at all (see above).
Just tried one more time with extra delay:
2023-09-13 18:36:08,888:INFO:certbot.compat.misc:Running manual-auth-hook command: /usr/local/bin/update_zone_acme_wrapper.sh
2023-09-13 18:42:07,819:INFO:certbot.compat.misc:Output from manual-auth-hook command update_zone_acme_wrapper.sh:
updating zone acme for abisoft.spb.ru (vq675ASgfq6cPr4eRCo5SUaRCX0crJgzVIOeYT-CpkM)
updating zone acme for abisoft.spb.ru - success, waiting for AXFR ended...
AXFR ended
...
2023-09-13 18:42:39,601:WARNING:certbot._internal.auth_handler:Challenge failed for domain abisoft.spb.ru
6 minutes should be fair enough to cover possible delay issues, right?