All the attempts to renew certificates for services hosted on the 8.43.85.0/24 subnet fails. We can clearly see the connection on the logs but the validation doesn’t happen:
The excerpt is taken from wiki.gnome.org (8.43.85.12). I believe there’s an issue with this specific subnet, it’d be lovely if any let’s encrypt engineer could look into it.
My web server is (include version):
httpd-2.4.6-89.el7_6.1.x86_64. No recent changes on the httpd configuration, nor on the let’s encrypt tools. The httpd configuration matches the one of hosts sitting on another subnet which receive validated certs just fine.
The operating system my web server runs on is (include version):
RHEL 7
I can login to a root shell on my machine (yes or no, or I don’t know):
yes
Hi, I am also hosting servers in the same DC, and I can reproduce on a unrelated server ( 8.43.85.171, just 1h ago ). Both averi and I have checked the network (with limited access), and from what I did see, the http request went on and the answer was sent, no error on the tcp level.
Visible Content: Not Found The requested URL /.well-known/acme-challenge/check-your-website-dot-server-daten-dot-de was not found on this server. Apache/2.2.15 (Red Hat) Server at letsencrypt.gnome.org Port 80
/.well-known/acme-challenge/random-filename is redirected to https://letsencrypt.gnome.org/.well-known/acme-challenge/random-filename.
Looks more that wiki.gnome.org has changed something, so the redirect from your domain isn't redirected. Why do you redirect to wiki.gnome.org?
The reason behind the redirects is not relevant here as Let’s Encrypt officially supports up to 10 redirects. Nothing changed on the wiki.gnome.org side nor on any host hosted on that subnet. @misc manages a set of systems that sit outside of GNOME and he’s affected by the problem as well.
I’m confident the problem lies on Let’s Encrypt side
We’re still waiting on our upstream ISP to resolve this. They are aware of the problem, but the responses we get for when the issue will be resolved are, “soon”.
Is there any chance this can be escalated? We’re going to run short on a set of certificates needing renewal with no enough time to rewrite the automation tools to swap the verification method to a different one than acme.
We are migrating a lot of sites and part of the migration process is to request a certificate. We are sporadically successful. Is there an option to point to another place for cert request?
Our upstream ISP is still investigating this. At this point, with the troubleshooting information we’ve gathered, it looks like the problem’s limited to the small slice of IP space you’re in (8.43.84.0/22). It’s reachable from most of our validation endpoints, but a traceroute from one of them appears to stop within Peak 10’s Raleigh POP, which is the last hop before your /22.
It’s hard to tell from the outside exactly how your network connectivity is set up, but I’ve spot-checked a few sites with what I think is very similar connectivity, and they are working for us.
We’re going to keep following up. It might speed things up for you to check with your network engineers, and with your immediate upstream ISP, as well. Is it possible there’s some kind of firewall or DDoS protection appliance that’s blocking our validation endpoints? That’s an issue we’ve seen before, with similar symptoms.
Your issue may be unrelated: we’re not aware of any problems affecting validations for sites on AWS.
I assume you’re also seeing a “Timeout during connect (likely firewall problem).” A frequent cause of this is publishing IPv6 AAAA records in DNS, without allowing IPv6 connections through your firewall (or AWS security group).
If that’s not it, could you please start a new thread with full troubleshooting info, including some sample domains?
The problem appears to be fixed. We had our network engineering team double check and it appears asymmetric routing got in the way. An RCA is still being worked out internally. Thanks a lot for your support!