DNS failures (SERVFAIL, timeout) for domains using Network Solutions/Web.com/worldnic.com nameservers

I don’t think so. It’s supposed to be zero. Don’t see why Unbound would flip it.

Z               Reserved for future use.  Must be zero in all queries
                and responses.
2 Likes

Our confidence keeps increasing that a rate limit on Web.com’s side is the problem. We’re still working with them to get this … resolved.

:sunglasses:

7 Likes

I can confirm we don't explicitly set that, and I would be shocked if Unbound set it for some reason.

In past incidents where DNS providers have deployed aggressive filters / rate limits, one common theme has been dropping "unrecognized" DNS stuff. As of four years ago, sometimes that included CAA queries, which were uncommon at the time. I would not be surprised to hear that z=1 is also treated as "unrecognized" and dropped.

1 Like

I am writing to confirm that we also have a problem with Let’s Encrypt Authorization for our DNS which is registered at Network Solutions as well and using worldnic name servers.

The problem we are seeing is:
Error accepting authorization: acme: authorization error for <DOMAIN_NAME>: 400 urn:ietf:params:acme:error:dns: DNS problem: SERVFAIL looking up A for <DOMAIN_NAME> - the domain’s nameservers may be malfunctioning.

As a side note, we have the certificate mechanism set up with Kubernetes and cert-manager version 0.13.1

I have also issued a support request to Network Solutions but have not yet received a reply. Will update as soon as I do

2 Likes

Also, is there any specific place where we could address to for this matter? I opened a support case in Network Solutions but they don’t really know what is going on and their response was really out of place…

1 Like

Hi Mitsos1os,

Welcome to the forums!

At this point, LE staff is well aware of the issue and doing what they can to resolve it. Unfortunately, their options are rather limited as the issue appears to be rate limiting from NetSol/Web.com and not anything on LE’s end.

As you’ve seen, going through regular support channel’s can be slow, so LE is asking for a community member to put them in touch with a NetSol engineer. I’ve reached out to my social network and haven’t found one. If you’d do the same, maybe you’ll have better luck and help get this resolved faster for all of us.

-Ben

2 Likes

Hi @kf6nux,

Thanks for the welcome and also thanks for the update about the situation.
Unfortunately I do not have any related engineers in my network…

I can’t believe that they won’t respond even at an email at their main contact form about such an issue which affects so many production affecting sites… :pensive:

Anyway, I will keep following the topic and hope a solution is found as soon as possible! :crossed_fingers:

2 Likes

We got this to work, but we had to move our DNS hosting off of network solutions, seems like they are trying to extract more money by promoting their ssl certificates instead.

1 Like

I plan to put in a ticket with Network Solutions to see if they can do something on their end. I suggest that everyone else who hosts there do the same.

1 Like

I noticed the percentage of errors has shifted for our ACMEv1 client. We’re seeing a lot more

403 urn:acme:error:caa: Error creating new cert :: Rechecking CAA for "__example.dom__" and __x__ more identifiers failed. Refer to sub-problems for more information

and fewer of the type we were seing before. Anyone else’s error rates/type change within the past few days?

As an FYI, we did tweak the RPC timeouts to our secondary validation sites yesterday, which won’t fix the problem with worldnic throttling us, but could impact the types of errors you see.

2 Likes

This appears to have gotten much worse for us today and we’ve opened up an incident: https://status.pantheon.io/incidents/zqm017s8p6kx

@JamesLE You said you’re working with web.com toward a resolution. Do you have an ETA or an update? Thank you.

Unfortunately, we do not have an ETA. A new team at Web.com may be looking into the issue as of this afternoon, but we’ve had difficulty communicating with them.

If you’re contacting Web.com, feel free to reference ticket number 19858355. This may or may not speed up resolution, but that ticket’s notes should help their CSRs understand the problem.

3 Likes

Writing a post that makes everyone unhappy:

Let's Encrypt fixed the bug below on 2020-02-29. If you were often using authorizations that had been validated more than 8 hours ago, the bug was decreasing the number of CAA queries and resultant errors, and now they're back up to the level they're supposed to be. :grimacing:

2 Likes

Interesting info @mnordhoff.

It looks like the bug fix happened on the 29th, and we first started seeing issues on the 14th.

I imagine that patch is going to increase load on NetSol and possibly increase whatever rate limiting they’re doing to LE, but it is definitely not the cause of this issue. Still, good to know.

1 Like

FYI,
After opening the first ticket at Network Solutions and getting no relevant reponse at all,I insisted and opened a second one re-explaining the situation and the response I received is the following:

This message is to let you know that the issue you reported to Network Solutions has been reviewed. Upon further investigation we have sent this issue to a higher level. Please allow 24-48 hours for us to resolve. Please accept our apologies for any inconvenience this may have caused.
Unfortunately this is an issue that needs to be handled by our engineers. The process is in place between them and the Lets Encrypt support team. These processes can take time as they need to be approved by higher levels. We are aware and working closely with them to resovle this.

10 Likes

Please keep us updated on any movement with that ticket. I think were going to have to reach out to our clients and have them do the same. Happy Birthday.

2 Likes

Let’s Encrypt’s planned cert revocation is going to make this issue with NetSol more painful. My company’s audit shows we have about 1,000 certs that LE is planning on revoking. We’re not sure yet how many of them contain NetSol/worldnic NS. Many of those certs have as many as 100 hostnames each, so the probability is high NetSol/worldnic will show up on many of them.

2 Likes

FYI, me too on this issue.

I’m on networksolutions and worldnic nameservers, noticed errors starting Feb 26, 2020 on renewal attempts, and still today on renewals for systems that are published correctly for over a year.

All remote IPs I have access to test with I can query my A records and reach the site and acme client just fine. Only the renewal itself is failing to get DNS from worldnic.

It appears Networksolutions / worldnic are blocking LetsEncrypt datacenters entirely at this point.

Some of the various errors I’m seeing right now on renewal attempts, the “SERVFAIL looking up A” is the most prevelant in todays attempts to renew.

  • DNS problem: SERVFAIL looking up A for
  • DNS problem: SERVFAIL looking up CAA
  • DNS problem: query timed out looking up CAA

We don’t have DNSSEC deployed, nor CAA records, so those errors really don’t make sense except that LetsEncrypt isn’t getting valid responses for CAA nor A record attempts, where is should be getting an NXDOMAIN for the CAA lookups so it knows to bypass the CAA check. But since all DNS queries are being blocked, a “SERVFAIL” causes the CAA check to fail.

1 Like

It’s happening for us as well.

I would agree with the previous message, we are not able to renew any worldnic domains at this point.

2 Likes