We have issues with one specific (sub)domain only, although other domains with the same name servers (NS: robotns3.second-ns.com, robotns2.second-ns.de, ns1.first-ns.de) renew without problems:
2020/07/20 09:00:50 Domain verification results for 'www.lsp.net': error.
During secondary validation:
DNS problem: query timed out looking up A for www.lsp.net
2020/07/27 09:00:48
Domain verification results for 'www.lsp.net': error.
During secondary validation: No valid IP addresses found for www.lsp.net
2020/08/03 09:00:48 Domain verification results for 'www.lsp.net': error.
During secondary validation:
DNS problem: SERVFAIL looking up A for www.lsp.net - the domain's nameservers may be malfunctioning
2020/08/10 09:00:48 Domain verification results for 'www.lsp.net': error.
During secondary validation:
DNS problem: networking error looking up A for www.lsp.net
(Source: Logs of a weekly cronjob running Crypt-LE 0.35 on Windows Server 2016)
Edit: I was able to renew the certificate by running the cronjob manually again.
I'm not sure. The fact that one of the errors is "networking error" is suspicious :(, added to the circumstances of the original report. I really don't want to speculate but it could be that something is up with the remote VAs. Might be time to ask @lestaff (link to the original report).
Digging into the logs I found the following information:
There were two issuance attempts for www.lsp.net. During the first issuance attempt we received the following errors from a random selection of RVA nodes
2020-08-10T07:00:36.679916+00:00 boulder-va[5016]: useHwQo Remote VA "xxx".PerformValidation returned problem: dns :: No valid IP addresses found for www.lsp.net
2020-08-10T07:00:46.720078+00:00 boulder-va[5016]: qqjOgA0 Remote VA "yyy".PerformValidation returned problem: dns :: DNS problem: networking error looking up CAA for www.lsp.net
During the second issuance attempt several hours later DNS had propagated, we were able to successfully validate the domain, and issue a certificate. https://crt.sh/?id=3216017655
2020-08-10T11:21:50.680184+00:00 boulder-va[13370]: uofTDgA [AUDIT] Checked CAA records for www.lsp.net, [Present: true, Account ID: xxxxx, Challenge: http-01, Valid for issuance: true]
2020-08-10T11:21:53.548219+00:00 boulder-ca[25045]: ieCyiQ0 [AUDIT] Signing success: serial=[04f3665528260d3637a7c2e6c60fc28753f8] names=[lsp.net, www.lsp.net] certificate=...
I noticed that Hetzner has been having some infrastructure maintenance today, perhaps the first issuance attempt occurred at the worst possible time? https://twitter.com/hetznerstats?lang=en
Sorry for flagging you, but mainly I wanted to find out whether there is any indication of why the remote VAs themselves have begun reporting a "networking error" during the DNS exchange, as opposed to the actual rcode from the recursor or context timeout?
It would be good to know whether to treat that as a normal timeout, or temporary condition, or something that needs attention.