DNS problem: query timed out looking up TXT

Hi,

For some reason I cannot renew the LE certificates anymore. It used to work well before. I've checked my domain against letsdebug and other dns checkers - no issues. Please, help me to figure out what's going on.

# whois -h whois.nic.ru abisoft.spb.ru
[Querying whois.nic.ru]
[whois.nic.ru]
domain:       ABISOFT.SPB.RU
nserver:      ns1.he.net
nserver:      ns2.he.net
nserver:      ns3.he.net
nserver:      ns4.he.net
nserver:      ns5.he.net
state:        REGISTERED, DELEGATED

# certbot renew --max-log-backups 30
...
Cert is due for renewal, auto-renewing...
Plugins selected: Authenticator manual, Installer None
Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
Renewing an existing certificate for *.abisoft.spb.ru and *.abisoft.biz
Performing the following challenges:
dns-01 challenge for abisoft.spb.ru
Running manual-auth-hook command: /usr/local/bin/update_zone_acme_wrapper.sh
Output from manual-auth-hook command update_zone_acme_wrapper.sh:
updating zone acme for abisoft.spb.ru (fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk)
updating zone acme for abisoft.spb.ru - success, waiting for AXFR ended...
AXFR ended

Waiting for verification...
Challenge failed for domain abisoft.spb.ru
dns-01 challenge for abisoft.spb.ru
Cleaning up challenges
...

# tail -f /var/log/letsencrypt/letsencrypt.log
...
2023-09-13 10:25:24,065:WARNING:certbot._internal.auth_handler:Challenge failed for domain abisoft.spb.ru
2023-09-13 10:25:24,066:INFO:certbot._internal.auth_handler:dns-01 challenge for abisoft.spb.ru
2023-09-13 10:25:24,067:DEBUG:certbot._internal.reporter:Reporting to user: The following errors were reported by the server:

Domain: abisoft.spb.ru
Type:   dns
Detail: During secondary validation: DNS problem: query timed out looking up TXT for _acme-challenge.abisoft.spb.ru
2023-09-13 10:25:24,076:DEBUG:certbot._internal.error_handler:Encountered exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/certbot/_internal/auth_handler.py", line 91, in handle_authorizations
    self._poll_authorizations(authzrs, max_retries, best_effort)
  File "/usr/lib/python2.7/site-packages/certbot/_internal/auth_handler.py", line 180, in _poll_authorizations
    raise errors.AuthorizationError('Some challenges have failed.')
AuthorizationError: Some challenges have failed.
...

During certbot work I was double checking from one of our independent servers that the challenge was correctly populated and was able to resolve without an issue:

$ date; for i in $(seq 1 5); do dig +short _acme-challenge.abisoft.spb.ru txt @ns$i.he.net; done
Wed Sep 13 07:24:59 UTC 2023
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"
"fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk"

Any help is highly appreciated!

2 Likes

I faced the same issue with the same TLD.

1 Like

@jsha could you please take a look at this issue? it seems to be pretty important

1 Like

The "During secondary validation" means that Let's Encrypt could connect to your DNS server from some locations but not all.

From my test system I can connect, and am getting NOERROR with no records for a TXT query for _acme-challenge.abisoft.spb.ru (presumably because you're not running certbot right this second).

Can you put a random test TXT value there (or maybe a record with a different name than _acme-challenge if you don't want to interfere with actual certificate acquiring attempts) so that others can try connecting from various places?

Does your DNS provider offer any kind of logs or status dashboard that could help dig into whether it's seeing all the requests coming in from Let's Encrypt's servers? There should be at least 3 for each name.

5 Likes

Is there any way to add a delay prior to the validation check?

4 Likes

@petercooperjr thanks for taking a look! just added the record you requested:

$ dig +short _acme-challenge.abisoft.spb.ru txt
"KHwPkBKCm4mdisJm8ta2EE9ZaVFvYnUQ64uLkV1ZIoQ"

BTW, it would be nice if LE reported the exact nameserver that failed

2 Likes

I can see how that would be nice to know.
But, as a paranoid security person, I can also see how that might be too much information to hand out.

4 Likes

But, as a paranoid security person, I can also see how that might be too much information to hand out.

Could you pls explain what you mean? In my understanding, it's not a big deal to add the failed NS to the log entry, like:
Detail: During secondary validation: DNS problem: query timed out looking up TXT from ns1.he.net for _acme-challenge.abisoft.spb.ru
I don't see any security problems here..

Then you are either:

  • not paranoid [enough]
  • not in network security

It seems clear to me that any such information can be used to aide nefarious actions [not yours].

The next thing one would ask is: Where did the DNS request come from [which IP exactly]?
These requests may be innocent, but providing such details openly to all can be very bad for business.

3 Likes

I'm not sure if their system really allows for them to say which nameserver failed; it calls out to unbound which I think is checking several of them in parallel. (Though the exact logic it follows is a bit opaque to me.)

All the places I'm trying seem to be working:

Are you still having trouble? Could have just been a temporary network/routing issue.

5 Likes

Please show the log files.
I'd like to see the timestamps on how long between the TXT upload/AXFR and the failed DNS result.

3 Likes

@wNRol: Can you please open a new thread in the Help section, and fill out the questionnaire it gives you there as best you can? Might be the same problem, but might not be.

6 Likes

It that is a past tense, then maybe you already solved the problem or have some additional information to add/help us here.
Otherwise, yes, open another topic if you are still having an issue.

4 Likes

@rg305 I don't quite understand whose security might be compromised if you add the failed NS in the LE response. And I'm not asking for the request source IP.
Anyway, since even LE does not know exactly which NS has failed, no reason to discuss it anymore.

Here is the excerpt from the log:

2023-09-13 10:23:36,018:INFO:certbot.compat.misc:Running manual-auth-hook command: /usr/local/bin/update_zone_acme_wrapper.sh
2023-09-13 10:24:53,563:INFO:certbot.compat.misc:Output from manual-auth-hook command update_zone_acme_wrapper.sh:
updating zone acme for abisoft.spb.ru (fGD1md2s1rHBzmM_Ctt8CqQiLUc-irZkVTra6PNCnMk)
updating zone acme for abisoft.spb.ru - success, waiting for AXFR ended...
AXFR ended

2023-09-13 10:24:53,569:INFO:certbot._internal.auth_handler:Waiting for verification...
...
2023-09-13 10:25:24,065:WARNING:certbot._internal.auth_handler:Challenge failed for domain abisoft.spb.ru
2023-09-13 10:25:24,066:INFO:certbot._internal.auth_handler:dns-01 challenge for abisoft.spb.ru
2023-09-13 10:25:24,067:DEBUG:certbot._internal.reporter:Reporting to user: The following errors were reported by the server:

Domain: abisoft.spb.ru
Type:   dns
Detail: During secondary validation: DNS problem: query timed out looking up TXT for _acme-challenge.abisoft.spb.ru

@petercooperjr I can try one more time but this renewal has been failing several times since Sep 12th. The last failure happened ~3 hours ago. So I don't think it's a temporary issue.

2 Likes

My point is information is power.
Be careful who you empower.
And providing unnecessary/optional information is a slippery slope...
You already gave "this", why not give "that" too.
And so on...
And so on...

If you "knew it all", you might be able to resolve your specific problem quicker.
If the "wrong person/state sponsored hacker" "knew it all", we might be left without a free and secure CA system.

If you provide information to one, you should provide it to all.
If you can't/shouldn't provide it to all, then you can't provide it to [any] one.

Note: I don't speak for LE; I'm simply stating my personal paranoid security view.
LE may have its' own reason(s) for not doing that or for not being able to do that.

3 Likes

So this shows it being about 1 minute 17 seconds between starting to run the script to update your DNS zone, and certbot triggering the validation. If I look at the documentation for what I think is your DNS service from Hurricane Electric, two things stand out to me:

  1. It's called an "open beta". The word "beta" has a wide range of meanings in the software world, but it may mean that it's not quite ready for production use in some sense (or at least that they felt the need to add that term as some sort of disclaimer).
  2. Their dynamic TXT update feature says "A propagation delay of up to 5 minutes may be experienced as the TTL of the record will need to expire and refresh. You should wait before requesting DNS01 validation once you have updated the record."

So, echoing what was said earlier,

Is there a configurable delay in that script after it updates the records and before it returns? Or can you add a sleep command or something to it?

Might not fix the problem, but it's something else to try.

5 Likes

@petercooperjr you made several assumptions that are not quite correct.

  1. HE dns system might have "beta" in its name, but it's kinda stable and I specifically added to the original message the part where I check for acme challenge at the same time as LE does that
  2. I don't use their dynamic TTL update feature. HE zones are slave zones, actually. So the
    update_zone_acme_wrapper.sh script updates the primary NS server and then waits for the AXFR to HE to complete

I can update the script any way you like, but (in my understanding) the delay is not a problem at all (see above).

Just tried one more time with extra delay:

2023-09-13 18:36:08,888:INFO:certbot.compat.misc:Running manual-auth-hook command: /usr/local/bin/update_zone_acme_wrapper.sh
2023-09-13 18:42:07,819:INFO:certbot.compat.misc:Output from manual-auth-hook command update_zone_acme_wrapper.sh:
updating zone acme for abisoft.spb.ru (vq675ASgfq6cPr4eRCo5SUaRCX0crJgzVIOeYT-CpkM)
updating zone acme for abisoft.spb.ru - success, waiting for AXFR ended...
AXFR ended
...
2023-09-13 18:42:39,601:WARNING:certbot._internal.auth_handler:Challenge failed for domain abisoft.spb.ru

6 minutes should be fair enough to cover possible delay issues, right?

1 Like

After the AXFR completes, yes, definitely enough wait time.

During that "wait time" you could check the SOA records at HE to see how long it actually takes them to update.

4 Likes

Well, it was worth a shot.

Has the error consistently been reporting that the failure is "During secondary validation"?

When was the last successful challenge?

@jcjones: Bugging you on this one just in case it's related to the validation server changes done in July, though I tend to doubt it.

5 Likes

There's definitely something odd going on, but I don't think it's anything to do with the VA updates from July.

Looking into it.

6 Likes