Timeout during cert renew - help

What IP's showed success in apache access log?

Did anything show in outermost firewall log?

Do you have an extra service from your ISP that might do DDoS or "safety" screening?

2 Likes

Each time I run the certbot, I get similar IPs returning success (200)...
3.144.17.208
3.127.79.226
18.237.187.55

Earlier today it was...
3.145.56.16
18.194.207.123
35.87.24.253

Always exactly three successful requests returning response code 200.

I'm actively tailing the firewall log, and because I turned off all IP blocking, it's not showing any new entries.

I don't have any ISP service for DDoS protection - nor cloudflare or anything like that.

No, certbot wouldn't be able to reach the ACME API.
The problem is well after that.

As for the 3 out of 4 requests, I can only presume there is then some sort of break [in the Internet] that is preventing that 4th site from reaching your IP.

Do you have control over any other system [at any other network] that can be used as a test site?

1 Like

When you say "test", I guess you mean to stand up a new Ubuntu/Apache server, copy the site, update the DNS, and test the renewal on it?

Maybe but not that would cause the timeouts. Not sure why certbot is complaining about missing nginx plug-in since you are using Apache plug-in. I'd given a sample webroot command earlier which would not have used either of those. What command did you use to produce that log?

As for the timeout, I looked at all the IP addresses in your apache logs in this whole thread and they show a pattern. None of the IP's from the Flexential hosting site are getting through. It is always the two US AWS sites and the German one that you see.

That said, I still consistently get errors from other test sites. @rg305, I get a timeout from Max's MSS test site based in Germany every time too. I'm going to post about that but I don't see symptoms of MSS change here.

3 Likes

Correct. Those are 3 challenge TYPES: DNS, HTTP, and TLS-ALPN. You requested HTTP challenge (apache plug-in) so the rest is just part of the ACME protocol. Nothing to see here :slight_smile:

Besides, if there were only 3 challenges and they all replied with 200 then you would have got a cert.

2 Likes

Mostly.
Except "test" means a separate "test" site. Like FQDN = "test.example.com"

2 Likes

The only MSS on the firewall is for VPN connections MSS clamping and max size, which shouldn't impact this. For kicks, I disabled it and tried renewing again, and it still failed.

Are you able to give me any IP or domain on the Flexential hosting side. I wouldn't mind running a tracert to see what happens.

I don't know what they are. I only know none ever show in your logs. I've posted on an internal board for new ideas.

You said earlier that nothing changed outside your premise. Well, we don't really know that at least as far as LE connectivity goes. All we know is you got a cert on Jun20 but fail the Aug renew forward. Many comms things likely changed between Jun20 and late Aug. They are usually transparent :slight_smile:

2 Likes

I really appreciate your effort here. I've learned a fair amount along the way. I'm going to read through the logs a little more and maybe try out a manual DNS validation.

I promised my kid I would take her to the amusement park tomorrow, so it'll have to be on Friday :slight_smile:

2 Likes

There seems to be some aggressive IP-blocking going on here. Summing up a few things I saw here:

These are all AWS IPs. A successful challenge would also include incoming connections from a Cloudflare IP - LE's primary datacenters use Cloudflare Magic Transit nowadays.

It appears as if these Cloudflare IPs (among others) are blocked on your side, leading to the firewall issue. These IP's have changed recently.

Actually no, but the responses received by the primary datacenters have higher priority than those of the secondaries. It appears that you are blocking the primary datacenters, with only secondaries going through.

You say you have turned off firewalling/geoblocking, but I don't think that was effective (as you saw no change).

I tried out connection tests from various sites just now and many of them failed. Most of my US-based tests went through however. Cloudflare IPs are pretty much impossible to geo-restrict, as those IPs are used globally - Cloudflare has a worldwide anycast network for most of its IPs. Any tool trying to correlate these to a specific location will be in trouble at some point.

My tool @MikeMcQ referred to (https://segmentist.germancoding.com/) is also unable to connect to your site. It uses the outbound IP address 176.9.103.107, which appears to be blocked on your side.

It is possible that there are other shenanigans going on, but an overly restrictive firewall appears to be the most probable cause at this time.

4 Likes

I setup logging on the NAT port forward, and I do indeed see a 4th request coming in from IP 23.178.112.*. I also see the MSS test website requests from 176.9.103.107. The firewall logs say it's being PASS'd to the web server, but the Apache access log says nothing.

So I guess this confirms what everyone else said - there is some sort of blocking happening on my side after all! :slight_smile:

The iptables/ufw on the web server is disabled, so it's either being blocked by some crazy rule on the firewall (silently), or there's some other mechanism blocking select requests... ugh. At least I've narrowed the scope of my issue.

2 Likes

Don't overlook the routing table.

2 Likes

I just wanted to reply back that this issue is resolved. After reviewing lots of tcpdumps, I finally looked more closely on this old web server and found very very old iptables entries that were doing even more IP filtering based up some very old lists. That's gone now, and everything started working right away.

I'm going to delete this post at some point today - as I'm not sure if I posted anything sensitive.

5 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.