We're a web host running our own ASNs, 209199 & 61024. As of the last few weeks, each and every server & IP on our network is unable to issue or renew Let's Encrypt certificates. We actively block traffic from a handful of dirty remote ASNs and had presumed that to be the problem, however we've completely lifted all restrictions and we're still facing the same errors.
Example:
Storing nonce: 0101jaxp8W5ajHmFCXnyLp9Tpkvk6bJ9EaKNh_g3JyME7gc
Challenge failed for domain mydomain.net
Challenge failed for domain www.mydomain.net
http-01 challenge for mydomain.net
http-01 challenge for www.mydomain.net
Notifying user:
Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems:
Domain: mydomain.net
Type: connection
Detail: During secondary validation: Fetching http://mydomain.net/.well-known/acme-challenge/K2hvxRcURyQ52uQLH7vS4Z4nzTqsmfoGvgQc--9r-Yo: Timeout during connect (likely firewall problem)
The problem is present on raw certbot and cPanel's implementation too.
I understand that Let's Encrypt's policy is to verify ACME files from a multitude of hosts to ensure the source is genuine and reachable, however with Let's Encrypt keeping these remote hosts and IPs a secret, it's impossible to trace and troubleshoot connection errors at our side.
How might one pursue to get more details on the remote hosts failing to connect?
Validation is known to occur via AWS in the US and Germany, so that presumably narrows it down however if you have lifted those restricts and it still fail it could be a routing issue.
You'll know more about this than me, but if your domains have both an IPv4 and IPv6 address try removing the IPv6 address on one of them to see if it's just failing to reach you over IPv6.
Thankfully there are no issues reaching any AWS region from any of our locations, so that's not the cause. In the past I've experienced more validation connections from a selection of hosts, and can't recall ever seeing AWS, but obviously things have changed in recent months.
We know it's a network issue, but unfortunately we have no way of finding out to whom, without the help of Let's Encrypt. Is this the best way to seek assistance from staff?
We can flag with with @lestaff so they know to take a look.
Obviously it's not your ability to see AWS that's the problem, it's the other way round (http requests outgoing from LE servers on AWS, through to your websites).
I'm afraid we don't really have a way to live debug connectivity issues between our secondary validation hosts and your servers, but it sounds like you are on the right track looking at your blocked traffic. We occasionally hear from other folks who block AWS IPs based on similar heuristics, and the symptoms are just as you're describing. Do you have logs from the system that blocks ASNs? Is it possible that the steps you took to lift restrictions didn't fully take effect?
Also if you have the ability to spin up an AWS instance (even a free tier instance should do), you can double check that HTTP fetches from various AWS regions are able to reach your servers.
Thanks for the info. We use BGP to grab prefix lists advertised by particular ASNs that we deem dirty, and the prefixes are tagged into an internal BGP community that is distributed across all our edge routers. (We're Anycasted & have 62 PoPs, as you may have seen already.) The community is now empty and reported as so across all our routers, plus we're now seeing dirty traffic from the undesirables.
We never block AWS ASNs; while there are occasional bouts of problematic traffic from AWS, their abuse team always respond promptly. We use extensive remote monitoring from StatusCake, Uptime Robot, FreshPing, Uptrends, UpDown.io as well as our own probes on AWS, DO, GoDaddy etc, so we know that overall connectivity in/out of our ASNs is flawless.
We do still have the AbuseIPDB list in place on all our edges to block community-reported bad traffic. I'm wondering if Let's Encrypt's secondary validation servers are being held up there? Do you set PTR to *.letsencrypt.org, run a web server on these validation servers, or anything that is easily identifiable as you? AbuseIPDB already exempt your 'static' IPs as well as Akamai, Cloudflare etc, but if the validation servers have bog-standard EC2 PTRs and they're being caught up in folks' broken servers, then perhaps they're being blocked on AbuseIPDB?
Some of those listings have relatively high "Confidence of Abuse". It's possible that from time to time, they are passing the threshold of being blocked or not (however that works in your network policy).
It would seem that since the reports are crowdsourced, by choosing to use this service, you are at the mercy/whims of paranoid server admins ^^.
While it's possible to submit AbuseIPDB reports manually, 99% of data is reported in realtime from brute-force and WAF error logs. Of course there are false positives, but AbuseIPDB's scoring system ranks based on the number of hosts reporting issues and the timeframe the reports come in. Overall, it works well and has dropped our abuse log by 38%.
You'll see that 66.133.109.36 is exempted in AbuseIPDB because it can match (by PTR) to Let's Encrypt. However, the others are not exempted because they have no indication of Let's Encrypt, with generic EC2 PTRs.
Would it be possible to:
Publish a list of IP addresses under active use by Let's Encrypt, even if only to select trusted partners inc. AbuseIPDB for exemption, or
Create a policy to set and ensure *.letsencrypt.org PTRs on all IPs in use, or
Run a web server with a simple "We're Let's Encrypt" page so people know that it's you, or
@_az@jsha Any thoughts on this? A search leads to thousands of forum threads crying out with similar issues that could be prevented with a little help from your side.
The main issue here is that they do not want the validator bots to see a different internet than common users.
And if you think about it, IP based bans are pretty outdated: a lot of clients are behind NAT (mobile users are the majority), and you risk being too broad in what you block (it's one thing to use fail2ban to block an IP for 10 minutes if you think they're DDosSsing you, but to do so for weeks or permanently? That IP could've changed hands 100 times).
You should either rethink your firewalling logic, or use dns-01 (I hope your authoritative nameservers aren't behind the same firewall).
Whatever your views and opinions on security practices, they should not be enforced or dictated by Let's Encrypt, whose founding basis is to ensure a more secure internet. The use of network-wide firewall lists to block "known bad offenders" should be an option to anyone if they wish for it.
Let's Encrypt have the means to avoid false listings by the way of quick and simple procedures at their end. If other security-oriented companies can understand this and work with list providers to ensure ample exemptions, it isn't unreasonable to at least request Let's Encrypt do the same. If there is hesitation or refusal at Let's Encrypt's side, it would be helpful to understand their stance and reasoning behind this.
Oh, yeah: there isn't hesitation. The reason pretty much is "those IP addresses are to be treated as throwaway ones." (And they are from commercial cloud providers, not from blocks assigned from IANA to Let's Encrypt -- it's not like they can tell you "it's this subnet")
They have repeatedly stated that they don't publish any such list.
As per my previous post, I'm not simply requesting a published list of all IPs used. I understand the managerial headaches and even potential security concerns around divulging this to the entire internet. If a list is to be compiled however, it should be shared only with trusted third-parties.
It is also possible to ensure Let's Encrypt's validation bots do not get caught up in lists/IPS/firewalls without publishing public IP lists, by setting valid PTR records or running a web server on :80 as I mentioned, which would help easily identify their bots. There will be other ways I have not thought of too, hence the request for comments and discussion.
Stating that Let's Encrypt's secondary validation bots are indifferent to genuine users and requesting no access bias is an oxymoron. The secondary validation requests are bot requests through and through; they request a very specific file from a directory that is explicitly created for bot and metadata use. There is absolutely no crossover between bot and genuine user request here.
In equal argument, since .well-known is designated for specific use, it should be bypassed from WAF, IPS and log file checks; indeed, on our managed servers, we do just this. However, we represent just a tiny fraction of the internet, and we alone cannot enforce or even educate users to implement such policy. Further, I do not see Let's Encrypt making headway with education or even reference to this matter either, which would greatly benefit the internet as a whole.
I'd be grateful for further feedback from Let's Encrypt staff to help find a more satisfactory solution for all impacted users.
Don't give way to too much anticipation. It's pretty unlikely this is going to change.
I have another option for you: only turn off the firewalls during validation. 30 seconds every two months. Most ACME clients have this feature already (not firewall specific, they can run arbitrary commands before and after validation)