Certbot renew fails with connection refused

Dear all,
I guess my problem will seem fairly "known" as I found many examples of it through google. However I found no solution (or I am actually too dumb to understand that I am reading a solution when it is written).
My Problem is this: I have a certificate with multiple subdomains connected. It is already renewed once on this server, which is installed behind a cable router. Ports 80 and 443 are forwarded.
Pages Served on port 80 and 443 are actually available from the outside IP.
However I get a "connection refused" when trying to renew with "certbot renew".
also Let's Debug and https://check-your-website.server-daten.de/?q=passys.nl tell me "unable to connect to server".
This is something I do not understand as nginx server is there, and it does serve pages.

My domain is:

I ran this command: certbot renew

It produced this output:
"detail": "Fetching http://webmail.passys.nl/.well-known/acme-challenge/9Fad4AjFCuzpI0c2mDhitByIbbeuFdebUqaziRYkVLM: Timeout during connect (likely firewall problem)",

My web server is (include version): NGINX 1.18.0

The operating system my web server runs on is (include version): Arch Linux

My hosting provider, if applicable, is: -

I can login to a root shell on my machine (yes or no, or I don't know): yes

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): 1.12.0

Full log: Letsencrypt.log - Pastebin.com

I guess I need a step-by-step as I don't get what is wrong here.
Thank you for your time.
regards,

Hendrik-Jan

Hoi Henk-Jan :wave:, welkom!

I'm not seeing anything wrong with the current affairs, in the sense that something might be misconfigured or things like that. I believe this might be one of the cases where nginx is being reloaded to serve the challenges and Let's Encrypt tries to query those challenges while nginx isn't ready yet.

You might want to increase the delay between the nginx reload for the challenges and the request by certbot to the Let's Encrypt servers to try to validate those challenges from the default value of 1 second to something longer with the --nginx-sleep-seconds option. Perhaps a few extra seconds is enough.

First, try it with --dry-run and a relative large number (i.e., 10 or 30 seconds or something like that) to see if this option actually helps. If it does help, you might want to decrease the delay to the amount of seconds when the --dry-run still functions. Then, you can remove the --dry-run for the actual renewal.

If --nginx-sleep-seconds doesn't help at all, we'll need to look further.

1 Like

Hi @hjheins

your result says, Letsencrypt checks webmail.passys.nl.

But your Letsdebug-check checks www.passys.nl, your check-your-website - check checks passys.nl + www.passys.nl (standard non-www + www).

Your webmail has a running nginx, your main domain isn't online.

So check your subdomain with both tools, not your main domain.

1 Like

Thank you Jürgen,
you are correct, i tested with a different subdomain.
This is the result with the correct one:

thank you,

Hendrik-Jan

1 Like

I can open the url

http://webmail.passys.nl/.well-known/acme-challenge/check-your-website-dot-server-daten-dot-de

in my browser.

So you must have a blocking firewall / .htaccess / something else.

Hallo Osiris,

thank you for the warm welcome. :slight_smile:
Thank you for your suggestion, that sounds like a lead.
I tested with: certbot renew --dry-run --nginx-sleep-seconds 10
however no change.
Error log: Letsencrypt.log nginx 10 secs - Pastebin.com

thanks,

Hendrik-Jan

1 Like

Hi Jürgen,

right, that is exactly how I tested as well (from an external IP).
I have a port forwarding on a router, no firewall settings there, I have Fail2ban, and I have shorewall as Firewall. But ports 80 and 443 are explicitly open.
also: stopping shorewall and fail2ban seems to make no difference.

thanks,

Hendrik-jan

You have something that blocks.

And you must find and remove / open it.

That's all.

Hi Jürgen,

I get your analysis, and I truely think you are right. However I am at a loss how I should approach this.
As you also tested: from a browser, you get a result.
I went through all the services on the system, and there is nothing there that could be blocking in some way. Worse: as I don't "see" that I have a blockage, I'm not even sure how to test for this. I mean: I see that a browser request works and certbot does not. But what is the difference between these to requests that one fails and one works?

thanks,
Hendrik-Jan

1 Like

nginx seems to be there and listening.

# netstat -ntlup|grep LISTEN
tcp        0      0 127.0.0.1:5432          0.0.0.0:*               LISTEN      403/postgres        
tcp        0      0 0.0.0.0:25              0.0.0.0:*               LISTEN      26472/smtpd         
tcp        0      0 0.0.0.0:2525            0.0.0.0:*               LISTEN      28326/master        
tcp        0      0 0.0.0.0:587             0.0.0.0:*               LISTEN      28326/master        
tcp        0      0 0.0.0.0:2222            0.0.0.0:*               LISTEN      363/sshd: /usr/bin/ 
tcp        0      0 127.0.0.1:783           0.0.0.0:*               LISTEN      744/perl            
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      26483/nginx: worker 
tcp        0      0 0.0.0.0:465             0.0.0.0:*               LISTEN      28326/master        
tcp6       0      0 ::1:5432                :::*                    LISTEN      403/postgres        
tcp6       0      0 :::25                   :::*                    LISTEN      26472/smtpd         
tcp6       0      0 :::2525                 :::*                    LISTEN      28326/master        
tcp6       0      0 :::993                  :::*                    LISTEN      28359/couriertcpd   
tcp6       0      0 :::587                  :::*                    LISTEN      28326/master        
tcp6       0      0 :::2222                 :::*                    LISTEN      363/sshd: /usr/bin/ 
tcp6       0      0 :::143                  :::*                    LISTEN      28345/couriertcpd   
tcp6       0      0 ::1:783                 :::*                    LISTEN      744/perl            
tcp6       0      0 :::465                  :::*                    LISTEN      28326/master 

I think the only different is the source IP address of the request: somehow something is acting as a firewall and blocking connections from some IP addresses, but not from others. That could be on your server itself, or upstream in your ISP or host.

In some cases this could be based on country or region, although it's not that common to have such a block without realizing it or asking for it.

Hi Schoen,

is there a way to trigger the response from certbot on another server in some way?
that might be a way to find out what's going on here...

Thanks,

Hendrik-Jan

It's not actually Certbot that's making the failed connections, it's the Let's Encrypt CA validator, which is distributed across several commercial data centers.

Certbot asks for a certificate from the Let's Encrypt API, and that triggers the CA to make its own connection attempts to your site in order to confirm that you really control the domain name that you're asking for a certificate for. Those are then incoming connections from unpredictable data centers anywhere out on the Internet.

A complicating factor is that Let's Encrypt does not publish the IP addresses that will be used for these connections, in order to discourage people from whitelisting them in firewalls (because they can change, and are expected to change, over time).

The examples with the other testing tools are pretty representative, though, because they're located in datacenters elsewhere on the Internet (not necessarily the exact same ones that Let's Encrypt uses!) and they can't connect to your service either. So those tests' failures are probably symptomatic of the same underlying cause.

Not sure if it helps, but I see this in the nginx logs. All times are UTC:

192.168.1.1 - - [26/Feb/2021:19:14:28 +0000] "GET /.well-known/acme-challenge/test HTTP/1.1" 404 153 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 SeaMonkey/2.53.6"
192.168.1.1 - - [26/Feb/2021:19:14:31 +0000] "GET /.well-known/acme-challenge/tests HTTP/1.1" 404 153 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 SeaMonkey/2.53.6"
52.58.118.98 - - [26/Feb/2021:19:16:55 +0000] "GET /.well-known/acme-challenge/9_XQ5CfxLBGNOzae2YARQ5p_Rk42lf51JGd0XokQ-pI HTTP/1.1" 200 87 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
17.58.88.16 - - [26/Feb/2021:19:18:49 +0000] "GET /.well-known/acme-challenge/9Fad4AjFCuzpI0c2mDhitByIbbeuFdebUqaziRYkVLM: HTTP/1.1" 404 153 "-" "AppleNewsBot"
17.58.86.217 - - [26/Feb/2021:19:19:52 +0000] "GET /.well-known/acme-challenge/9Fad4AjFCuzpI0c2mDhitByIbbeuFdebUqaziRYkVLM: HTTP/1.1" 404 153 "-" "AppleNewsBot"

thanks,

Hendrik-Jan

Thank you schoen.
That sounds logical. Too bad it basically means that I'm stuck.
I would have to get back to self signed certificates.

Hendrik-Jan

1 Like

You can also experiment with the --debug-challenges option. With that option, certbot will wait before triggering the validation until the user has signaled to continue the pause.

During that pause, you could try to remotely fetch the challenge file and see if something goes wrong.

1 Like

Some of the testing tools (that I think also failed to connect to your site) do have predictable IP addresses, so you could try to debug using that... and maybe ask your ISP to help investigate if you can't solve it on your server itself.

I'm sure @hjheins has something that blocks. And it's not regional specific.

My browser ip uses dynamic Telekom-DSL from Berlin.

"check-your-website" is from a datacenter - in Berlin. But with a known datacenter ip address.

PS: And that

No connection could be made because the target machine actively refused it 130.180.67.26:80

says: There is a blocking instance (that's a command line output from the "check-your-website" database server trying to connect the domain).

PPS: Now it's funny.

Using the webserver from "check-your-website" I am able to connect your domain.

That's the 85.215.2.226 or ipv6, the database server has the 227.

May be too much checks from one ip -> a spam filter is triggered.

(OT: The checks of "check-your-website" are from the database server, so that check from the webserver was the first connection).

1 Like

Hi Jürgen,

the [quote]No connection could be made because the target machine actively refused it 130.180.67.26:80[/quote]
where is that coming from? I didn't get that. That would indeed suggest it is my server.

I am trying to see if at least i can the challenge in the browser as per @Osiris suggestion.
So I am trying to run:

certbot renew --dry-run --debug-challenges

However before I can even paste the string to the browser, certbot already continues. Am I missing a parameter here?

thanks,

Hendrik-Jan

PS: I have a Unity Media/Vodafone Business connection in NRW.