IMHO, the most likely cause is a routing issue (possibly some misconfigured router along the way).
Neither you nor LE maintain any of those (Internet) routers.
Your only hope (if it is a routing issue) is to contact your ISP [you pay them for their support].
Bingo!
Why?
How about you could put Wireshark on each segment of the Internet between your server and the LE validation servers (all of them) and diagnose the problem that way?
Yes.
The only thing I need to know which roter blocks/lost the packets for the the connection LE -> my server.
To know this someone should perform at least tracerote from the LE server.
There is more than one way to skin a cat.
But, yes, doing a traceroute would be an easy way.
Good luck getting an LE person to stop what they are doing [very short on staff] to address this problem this way.
I believe if that information was available it would be shared with you. But with the multi prespective validation, I believe from CDN providers, the information you seek seems like it would extreamly expensive both in terms of money and human time.
I'm sorry to see that this validation problem has been difficult for @smon and the community to diagnose. We share your frustration about how tough it can be to find the exact spot where there's a routing, firewall, or similar problem.
This seems like a good moment to clarify some context:
As some community members have noted here, we're a small team responsible for a great many certificates. It would be impossible to troubleshoot most individual subscribers' problems ourselves. We hugely appreciate our community members' help in doing this. They flag us on problem reports that are especially unusual, or are part of a pattern that could mean there's a problem on our end; and we also watch for these patterns ourselves.
Because of this, we need everyone seeking help to provide enough information to help our community troubleshoot their problem. If you have a concern about sharing some information, we sympathize (since security and privacy are our whole thing) and will try our best to help you. But some types of problem can't be solved without all the information and all the context.
Because of this, we also need everyone seeking to help to be mindful of a subscriber's own context and concerns. If someone's reticent to post some information or try a troubleshooting step, by all means gently encourage them and try to talk them through it, but please don't let that descend into an argument.
Now, on to this issue:
I checked with my colleagues and located the domain name you had sent us in a PM. It looks like you were requesting a certificate with a large number of Subject Alternative Names (SANs), which we validate all at once. Customers' reports across the Web suggest that Hetzner may have built-in, network-level DDoS protection. If that's correct, then they are likely misidentifying this large traffic "spike" from a small number of IP addresses (ours), in quick succession, as a type of DDoS.
We haven't identified any pattern of problems with Hetzner; I see a normal validation success rate for other requests at the times you attempted to validate, and over time for your "neighbors" on nearby IP addresses. So, I think this is the problem.
If this is it, and Hetzner isn't able to help you, then I recommend requesting more certificates with fewer hostnames in each. That will spread out the validation traffic and prevent this problem.
Thank you for detailed comments!
Yes, Hetzner like many other providers has built in anti-ddos protection. But it works only for low level ddos atacks such as syn/udp flood, dns reflection, ntp reflection etc. Sure no LE validation server could not generate traffic to one host like those attacks. LE validation servers just send regular http requests on the port 80 with low rate comparing to the anti ddos triggers.
Let's clarify: I have N domains in the LE cert. If renewal fails my server for each attempt gets strictly N * 2 validation http requests from the various LE servers. All LE requests reach my server without any troubles. But there are no third requests to any of N domain ftom any LE validation server. If suppose that some anti ddos filter blocks some requests I definitely would not see strictly N*2 requests for each attempt.
Moreover when I faced with real high level ddos attacks (level 7 - many identical valid http requests with high rate) no one requests was blocked on the provider's level.
Resume: provider's anti ddos protection do not block regular http requests (it would by strange if someone provider do this).
If renewal succeed my server gets strictly N*4 validation http requests from the various LE servers.
I can presume that in some conditions LE validation process fails after second validation for some internal reasons (actually related or not with connection). It may depends on N value or may not depends. I do not know your internal kitchen. But I do know not by hearsay how hard to debug such errors on highly loaded systems.
So I very appreciate that LE team trying to help.
I hope my case may be useful for future developing.
Thank you!
There should be 4 requests per validation attempt.
You may wish to check any active protection status concerning your server on the DDOS dashboard provided by Hetzner.
Exactly! 4 requests from different IP's = success.
In my case validation process usually stops after 2 requests for each domain in the cert.
My cert have been renewed by certbot over the years without any problems. Since May 2022 this successfull story was broken. I assumed some LE internal issue in renewal process for cert with many domains. Hope this helpful for someone.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.