I plan to put in a ticket with Network Solutions to see if they can do something on their end. I suggest that everyone else who hosts there do the same.
I noticed the percentage of errors has shifted for our ACMEv1 client. We’re seeing a lot more
403 urn:acme:error:caa: Error creating new cert :: Rechecking CAA for "__example.dom__" and __x__ more identifiers failed. Refer to sub-problems for more information
and fewer of the type we were seing before. Anyone else’s error rates/type change within the past few days?
As an FYI, we did tweak the RPC timeouts to our secondary validation sites yesterday, which won’t fix the problem with worldnic throttling us, but could impact the types of errors you see.
This appears to have gotten much worse for us today and we’ve opened up an incident: https://status.pantheon.io/incidents/zqm017s8p6kx
Unfortunately, we do not have an ETA. A new team at Web.com may be looking into the issue as of this afternoon, but we’ve had difficulty communicating with them.
If you’re contacting Web.com, feel free to reference ticket number 19858355. This may or may not speed up resolution, but that ticket’s notes should help their CSRs understand the problem.
Writing a post that makes everyone unhappy:
Let’s Encrypt fixed the bug below on 2020-02-29. If you were often using authorizations that had been validated more than 8 hours ago, the bug was decreasing the number of
CAA queries and resultant errors, and now they’re back up to the level they’re supposed to be.
Interesting info @mnordhoff.
It looks like the bug fix happened on the 29th, and we first started seeing issues on the 14th.
I imagine that patch is going to increase load on NetSol and possibly increase whatever rate limiting they’re doing to LE, but it is definitely not the cause of this issue. Still, good to know.
After opening the first ticket at Network Solutions and getting no relevant reponse at all,I insisted and opened a second one re-explaining the situation and the response I received is the following:
This message is to let you know that the issue you reported to Network Solutions has been reviewed. Upon further investigation we have sent this issue to a higher level. Please allow 24-48 hours for us to resolve. Please accept our apologies for any inconvenience this may have caused.
Unfortunately this is an issue that needs to be handled by our engineers. The process is in place between them and the Lets Encrypt support team. These processes can take time as they need to be approved by higher levels. We are aware and working closely with them to resovle this.
Please keep us updated on any movement with that ticket. I think were going to have to reach out to our clients and have them do the same. Happy Birthday.
Let’s Encrypt’s planned cert revocation is going to make this issue with NetSol more painful. My company’s audit shows we have about 1,000 certs that LE is planning on revoking. We’re not sure yet how many of them contain NetSol/worldnic NS. Many of those certs have as many as 100 hostnames each, so the probability is high NetSol/worldnic will show up on many of them.
FYI, me too on this issue.
I’m on networksolutions and worldnic nameservers, noticed errors starting Feb 26, 2020 on renewal attempts, and still today on renewals for systems that are published correctly for over a year.
All remote IPs I have access to test with I can query my A records and reach the site and acme client just fine. Only the renewal itself is failing to get DNS from worldnic.
It appears Networksolutions / worldnic are blocking LetsEncrypt datacenters entirely at this point.
Some of the various errors I’m seeing right now on renewal attempts, the “SERVFAIL looking up A” is the most prevelant in todays attempts to renew.
- DNS problem: SERVFAIL looking up A for
- DNS problem: SERVFAIL looking up CAA
- DNS problem: query timed out looking up CAA
We don’t have DNSSEC deployed, nor CAA records, so those errors really don’t make sense except that LetsEncrypt isn’t getting valid responses for CAA nor A record attempts, where is should be getting an NXDOMAIN for the CAA lookups so it knows to bypass the CAA check. But since all DNS queries are being blocked, a “SERVFAIL” causes the CAA check to fail.
It’s happening for us as well.
I would agree with the previous message, we are not able to renew any worldnic domains at this point.
We’ve just confirmed that we are no longer rate limited reaching Web.com’s nameservers. Thanks to them, and many thanks to all of you, for working to resolve this. Please do let us know if you’re still seeing problems.
I spoke too soon, I think were good. Thank you very much!
Thank you! This has helped clear most of our queue of pending renewals!
Sorry for the delayed response @knas
I didn’t have any update on the issue, but as I see from the comments the issue must have been resolved. We will also try ourselves again.
FYI, in case you continue to face any problem and you need a history to refer to, the Network Solutions ticker number I opened is: 19868185
I also wanted to update that I heard back on my Network Solutions ticket. According to support “Our team has whitelisted Let’s Encrypt IPs that were experiencing query rate limiting”.
So far, I have not seen any more failures.
If that’s the case, this might happen again next time Let’s Encrypt adds new IPs.
We now have a line of communication with them, and hopefully can avoid future problems.
Is anyone else starting to see this again?