During secondary validation - DNS problem: networking error looking up A

We have issues with one specific (sub)domain only, although other domains with the same name servers (NS: robotns3.second-ns.com, robotns2.second-ns.de, ns1.first-ns.de) renew without problems:

2020/07/20 09:00:50 Domain verification results for 'www.lsp.net': error.
  During secondary validation: 
  DNS problem: query timed out looking up A for www.lsp.net

2020/07/27 09:00:48
  Domain verification results for 'www.lsp.net': error.
  During secondary validation: No valid IP addresses found for www.lsp.net

2020/08/03 09:00:48 Domain verification results for 'www.lsp.net': error.
  During secondary validation: 
  DNS problem: SERVFAIL looking up A for www.lsp.net - the domain's nameservers may be malfunctioning

2020/08/10 09:00:48 Domain verification results for 'www.lsp.net': error.
  During secondary validation: 
  DNS problem: networking error looking up A for www.lsp.net

(Source: Logs of a weekly cronjob running Crypt-LE 0.35 on Windows Server 2016)

Edit: I was able to renew the certificate by running the cronjob manually again.

1 Like

Hi @claas

your problem looks different. So I've moved it to a new topic.

That

says: The primary Letsencrypt servers are able to check your domain.

The secondary are not.

Looks like a regional firewall / blocking.

I'm not sure. The fact that one of the errors is "networking error" is suspicious :(, added to the circumstances of the original report. I really don't want to speculate but it could be that something is up with the remote VAs. Might be time to ask @lestaff (link to the original report).

2 Likes

Yes, sorry, my error.

Read the first three results, then moved it.

Time that Letsencrypt should check it.

1 Like

Thanks for the heads up. I’m going to dig in and take a look.

Digging into the logs I found the following information:

There were two issuance attempts for www.lsp.net. During the first issuance attempt we received the following errors from a random selection of RVA nodes

2020-08-10T07:00:36.679916+00:00 boulder-va[5016]: useHwQo Remote VA "xxx".PerformValidation returned problem: dns :: No valid IP addresses found for www.lsp.net
2020-08-10T07:00:46.720078+00:00 boulder-va[5016]: qqjOgA0 Remote VA "yyy".PerformValidation returned problem: dns :: DNS problem: networking error looking up CAA for www.lsp.net

During the second issuance attempt several hours later DNS had propagated, we were able to successfully validate the domain, and issue a certificate. https://crt.sh/?id=3216017655

2020-08-10T11:21:50.680184+00:00 boulder-va[13370]: uofTDgA [AUDIT] Checked CAA records for www.lsp.net, [Present: true, Account ID: xxxxx, Challenge: http-01, Valid for issuance: true]
2020-08-10T11:21:53.548219+00:00 boulder-ca[25045]: ieCyiQ0 [AUDIT] Signing success: serial=[04f3665528260d3637a7c2e6c60fc28753f8] names=[lsp.net, www.lsp.net] certificate=...

I noticed that Hetzner has been having some infrastructure maintenance today, perhaps the first issuance attempt occurred at the worst possible time? https://twitter.com/hetznerstats?lang=en


Edit: I’m taking another look at the initial report from Increase in renewal DNS failures

2 Likes

Thanks for looking into this.

Indeed this second (manual) execution of our cronjob (after posting my reply in the forum) succeeded.

I looked into the maintenance announcements and DNS does not seem to have been affected by it.

1 Like

Hi Phil,

Thanks!

Sorry for flagging you, but mainly I wanted to find out whether there is any indication of why the remote VAs themselves have begun reporting a "networking error" during the DNS exchange, as opposed to the actual rcode from the recursor or context timeout?

It would be good to know whether to treat that as a normal timeout, or temporary condition, or something that needs attention.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.