Are two (or more) _acme-challenge TXT records (where one is ok from letsencrypt point of view) for a domain considered invalid situation?
Had few domains where letsencrypt failed with:
“DNS problem: server failure at resolver looking up TXT for _acme-challenge…”
exactly because there were multiple _acme-challenge TXT records. Removing old _acme-challenge entries and leaving only recent one made letsencrypt pass dns validation.
IMO letsencrypt should check all _acme-challenge records and pass if one was valid.
Hmm. This sounds like a problem with your authoritative DNS server. Can you share the domain name(s) that failed this way so I can check our logs? I suspect your server returned a SERVFAIL response instead of the TXT records.
If @arek had too much txt records, the answer received by the DNS Server could be above 512 bytes so Let's Encrypt saw that as a SERVFAIL error.
I've tested it with 6 TXT records for _acme-challenge.domain.tld and the response from my DNS server was above 512 bytes so Let's Encrypt returned this error:
The server could not connect to the client to verify the domain :: DNS problem: server failure at resolver looking up TXT for _acme-challenge.domain.tld
Having only 3 TXT records, Let's Encrypt validates it without any issue so I think the @arek's issue was the big answer sent by the DNS Server.
That's certainly a possible explanation, thanks @sahsanu! If that's the case then the problem should be resolved on ~Thursday the 1st, after this week's Boulder deploy. @jsha addressed the 512 byte answer limitation in Boulder master.
Hm, there would be more since these 2-3 txt were old/stale records (that shouldn't be there but were) and our custom letsencrypt software would add another one for current verification, so it's possible that these queries hit 512 bytes limit mentioned earlier.