Are the issues with Secondary DNS validation servers still occurring? I see a lot of these errors across a number of domains. The error message varies between:
- During secondary validation: DNS problem: SERVFAIL looking up A
- During secondary validation: DNS problem: query timed out looking up A
The previous issues I've seen on this topic suggest that it was load-related and predisposed to occur when people schedule their job at *:00. But as far as I can tell, those issues are believed to be resolved. Yet this still happens.
And it is time-related, not so much by what part of the hour, but rather what hour of the day. Each site runs renewal attempts (when needed) once per day, and each site has a random hour of the day when that happens. Today, for example, sites that tried to renew during the 01:00 UTC hour failed. (At times ranging from 01:01 UTC to 01:13 UTC.) But all the ones that occurred during 00:00 - 01:00 and after 02:00 UTC hour (so far) have succeeded.
This has been going on for quite a while, but this is the first time I've noticed that it's always at the same time of day.
The DNS servers and configurations have been rigorously checked and everything is OK. There are no issues with RPKI (valid) or DNSSEC (not used). Also, FWIW, the "Let's Debug" test passes every time for affected domains.
It's pretty tempting to just not schedule these during that hour, but I would really like to get to the bottom of this.
Thanks for any insight!