Timeout during cert renew - help

Hello,

Appreciate any help in advance. I have several sites that aren't renewing any longer. It was working recently. The only change I can think of is that I moved the host to a new subnet recently, but I don't see why that would make any difference. All sites work fine and DNS records look ok.

I've tried disabling the firewall, but that didn't help. I also tried turning off the http->https 301 redirect, but despite several hours of effort, I couldn't stop the redirects - not sure that's even related.

I'm at a loss. Info below - thanks for any help!

My domain is:mdainsurance.com

I ran this command: sudo ./letsencrypt-auto -d mdainsurance.com

It produced this output: below

My web server is (include version): Server version: Apache/2.4.46 (Ubuntu)

The operating system my web server runs on is (include version): Ubuntu 16.04 ESM

My hosting provider, if applicable, is: selfhosted

I can login to a root shell on my machine (yes or no, or I don't know): Yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):

/home/dc1/letsencrypt/letsencrypt-auto has insecure permissions!
To learn how to fix them, visit Certbot-auto deployment best practices
Your system is not supported by certbot-auto anymore.
certbot-auto and its Certbot installation will no longer receive updates.
You will not receive any bug fixes including those fixing server compatibility
or security problems.
Please visit https://certbot.eff.org/ to check for other alternatives.
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator apache, Installer apache
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for mdainsurance.com
Waiting for verification...
Challenge failed for domain mdainsurance.com
http-01 challenge for mdainsurance.com
Cleaning up challenges
Some challenges have failed.

IMPORTANT NOTES:

  • The following errors were reported by the server:

    Domain: mdainsurance.com
    Type: connection
    Detail: 71.183.64.65: Fetching
    http://mdainsurance.com/.well-known/acme-challenge/IRjsWnszJPO4JUGD8lGJAoY1Ep9lYcZpIV095Jflh2U:
    Timeout during connect (likely firewall problem)

    To fix these errors, please make sure that your domain name was
    entered correctly and the DNS A/AAAA record(s) for that domain
    contain(s) the right IP address. Additionally, please check that
    your computer has a publicly routable IP address and that no
    firewalls are preventing the server from communicating with the
    client. If you're using the webroot plugin, you should also verify
    that you are serving files from the webroot path you provided.
    dc1@dc1:~$ sudo /home/dc1/letsencrypt/letsencrypt-auto -d mdainsurance.com
    /home/dc1/letsencrypt/letsencrypt-auto has insecure permissions!
    To learn how to fix them, visit Certbot-auto deployment best practices
    Your system is not supported by certbot-auto anymore.
    certbot-auto and its Certbot installation will no longer receive updates.
    You will not receive any bug fixes including those fixing server compatibility
    or security problems.
    Please visit https://certbot.eff.org/ to check for other alternatives.
    Saving debug log to /var/log/letsencrypt/letsencrypt.log
    Plugins selected: Authenticator apache, Installer apache
    Obtaining a new certificate
    Performing the following challenges:
    http-01 challenge for mdainsurance.com
    Waiting for verification...
    Challenge failed for domain mdainsurance.com
    http-01 challenge for mdainsurance.com
    Cleaning up challenges
    Some challenges have failed.

IMPORTANT NOTES:

  • The following errors were reported by the server:

    Domain: mdainsurance.com
    Type: connection
    Detail: 71.183.64.65: Fetching
    http://mdainsurance.com/.well-known/acme-challenge/GfgPab-VUBSdCUBzWjvyMFSXyN-AL3FU64R_Gyfhb-s:
    Timeout during connect (likely firewall problem)

    To fix these errors, please make sure that your domain name was
    entered correctly and the DNS A/AAAA record(s) for that domain
    contain(s) the right IP address. Additionally, please check that
    your computer has a publicly routable IP address and that no
    firewalls are preventing the server from communicating with the
    client. If you're using the webroot plugin, you should also verify
    that you are serving files from the webroot path you provided.

1 Like

Welcome back @hs8sj3kl2nds8sn

Did you also change or add a firewall recently that might be part of the new network? Because tests from various locations work fine. But, tests from actual Let's Encrypt servers (both staging and production) timeout. My best guess is you have a firewall blocking certain IP addresses that LE servers use.

We have had trouble with Palo Alto Networks brand firewalls but I did all the known tests we have for those and I don't see one of those known problems.

For example, the Let's Debug test site does 2 connections to your site. The first connects but fails with an expected HTTP 404 error. The second test uses the Let's Encrypt staging system and that connection times out. See:

4 Likes

Do you use anything like fail2ban ?

4 Likes

I have geoblocking turned on. I've tried turning it off, but that doesn't help.

I'll turn it off now in case you want to try again.

4 Likes

No fail2ban. For now, I've disabled the firewall rules.

1 Like

You could try using DNS-01 authentication:

mdainsurance.com        nameserver = ns-cloud-a1.googledomains.com
mdainsurance.com        nameserver = ns-cloud-a2.googledomains.com
mdainsurance.com        nameserver = ns-cloud-a3.googledomains.com
mdainsurance.com        nameserver = ns-cloud-a4.googledomains.com

Put those back on.
I don't see that as being part of the problem.

Also, it might be time to retire that version [and get a new one OR another ACME client].

3 Likes

My understanding is that Ubuntu isn't supported anymore so I'm not sure if a new version would work.

If I manually verified via DNS would the auto-renew script still work? And would it work with subdomains as well?

Are you using some old or unusual network gear? There was an obscure change with the Let's Encrypt Servers recently regarding MSS network size. This started Jul19 so after your last good cert but before your renewal attempt around Aug20.

The symptom is just a timeout to your site from the LE servers. We have an internal tool to check but that times out reaching you too so I'm not sure what that means.

As an aside, did you put your geo block back on? Because you need to at least have US and Europe open. I can see your sites from some world regions but not others. Specifically, not one in Europe or in Australia.

3 Likes

Geo block is still turned off - so it should be accessible from anywhere. Although I only forward ipv4 TCP.

Is your tool using something else?

I'm using PFSense network appliance. I'm not sure I've changed any settings related to MSS... let me see. Would the NAT reflection settings matter?

My initial assumption was that the LE script was starting an outbound TCP connection, so any inbound blocking rules I created wouldn't block any replies anyway.

1 Like

I don't know what tricks the internal tool uses. I'll ask internally about that.

But, unusual timeouts still occur even unrelated to Let's Encrypt. I often use this tool to quick check connectivity and only 3 of 5 regions connect

2 Likes

See:

2 Likes

Not promoting it, but I also have a fair share of outdated Ubuntu systems:
image
image
image
image

[I just keep all mine very far from the Internet :slight_smile:]

2 Likes

It's a bit odd. Why would the HTTP check be ok, but the LetsEncryptStaging fail? Is it possible it's having an issue with the http->https redirect?

1 Like

It doesn't reach the redirection.
It fails to get any HTTP content:

3 Likes

Do you have any kind of "smart firewall" or "adaptive firewall" enabled?

Some firewalls have a setting to protect against DDoS attacks. The Let's Encrypt challenges are identical but from different IP's and sometimes look like DDoS for such settings that are "too sensitive".

This might also explain why I can't see your site from other testing systems but individual requests work fine.

Another example is: geopeeker.com saw from just 4 of 6 locations

===========

As for the redirect causing timeout it is possible but I don't get a timeout for a simple redirect. Not even when I use a URL like an HTTP challenge would.

3 Likes

There is a program called pfblockerng that does the geo-blocking as well as some abuse ip lists. I've disabled it now. No other adaptive blocking... geopeeker.com seems to return all 6 locations for me.

...but the LE renew still failed.

1 Like

So if I'm able to get rid of the http->https redirection, it might work?

1 Like

IDK. It just looks like inconsistent routing to your site or something erratic inside I suppose.

My last geopeeker only saw 5 of 6 and the site24x7 only saw 2 of 5.

As Rudy noted, the timeout error for the challenge is failing on the http request so removing the redirect won't help. In fact, when I test challenges I see a redirect from your apex domain to your www subdomain but both as http requests. There are no http->https redirects for the challenge format URL. And, no redirect at all if start with the http://www.mdainsurance.com/.well-known/acme-challenge/SampleToken URL

Signing off for night. Best of luck

2 Likes

Highly unlikely.

2 Likes

After you fix the access issue, you need to fix the http>https redirection.
As @MikeMcQ pointed out, the only redirection I see for the challenge URLs are: mdainsurance.com to www.mdainsurance.com

I bet you may find other issues too - Apache is quite mischievous.
We should have a look at the output of:
apachectl -t -D DUMP_VHOSTS

2 Likes