DNS problem - SERVFAIL for (seemingly) correctly replied names

I think you've got everything right in your analysis there: Let's Encrypt (or rather, the DNS resolver they're using) can't always distinguish between a permanent or transient failure, and to some extent they really can't given how UDP and DNS work.

While it's always annoying when not everything matches the specs perfectly, I don't think there are any clients that really try to distinguish between types of errors (even when they should). Like I said, the popular client certbot always tries to renew a couple times a day, regardless of the reason why a failure happens, until it succeeds. This actually is pretty terrible in some ways, in that if a site is moved elsewhere (DNS pointed to a different server) but the original server is still running and nobody updates certbot on it, that certbot will dutifully attempt to keep on renewing (and failing each time) forever. I think things like this are what lead to about 80% of HTTP-01 validations failing, which means more clients should actually be smarter about not retrying if it fails for long enough.

But if what you're looking for is the "official advice", the closest thing to that which I know of is in the last section of the Integration Guide, which doesn't really make any attempt at distinguishing between temporary or permanent errors, and seems to me to just say that any "Renewal failure should not be treated as a fatal error."

6 Likes