Cannot renew with Certbot "Connection refused by peer"

This smells like a lie, assuming you don't host this website as well:

% curl -IL http://www.wmich.edu/.well-known/acme-challenge/4gsNLk776etwY1JQjkT_Rvj2mLSSzaSxDYKFL83ILN0
curl: (56) Recv failure: Connection reset by peer

% curl -IL http://www.wmich.edu/.well-known/acme-challenge
HTTP/1.1 301 Moved Permanently
Server: nginx/1.20.1
Date: Fri, 08 Apr 2022 18:17:57 GMT
Content-Type: text/html
Content-Length: 169
Connection: keep-alive
Location: https://wmich.edu/.well-known/acme-challenge

HTTP/2 404
server: nginx/1.20.1
date: Fri, 08 Apr 2022 18:17:58 GMT
content-type: text/html; charset=utf-8
x-content-type-options: nosniff
x-powered-by: PHP/7.2.32
x-drupal-cache: MISS
expires: Sun, 19 Nov 1978 05:00:00 GMT
x-content-type-options: nosniff
content-language: en
x-frame-options: SAMEORIGIN
permissions-policy: interest-cohort=()
x-generator: Drupal 7 (https://www.drupal.org)
link: <https://wmich.edu/>; rel="canonical",<https://wmich.edu/>; rel="shortlink"
vary: Accept-Encoding
cache-control: no-cache, max-age=0
x-varnish: 550734339
age: 0
via: 1.1 varnish
strict-transport-security: max-age=7200; includeSubDomains

They probably upgraded their WAF and they're not realizing it's killing all acme http-01 validations.

4 Likes

That is sad (especially in light of the amount of fruitless debugging work that it then imposes on other people downstream, like the original poster).

I would suggest that in cases of weird network behavior it can be helpful to run a packet sniffer like Wireshark (especially as here, where a technical organization/department is trying to determine whether the error is in its own configuration or some other organization's configuration). This can make more clear whether what's seen by the outside world matches what the server is doing or not. If it doesn't, then it means that some other device in between is interfering.

A long time ago we were investigating network interference by ISPs and we found it instructive to run a packet sniffer at both ends of the connection, and then compare the traffic as seen by both ends. This can reveal if one end apparently receives something from the other end that the other end doesn't recall having sent—which would then mean that some other device in between actually sent that.

In this case, you could run a packet sniffer and a web client (whether a browser or curl) on a non-university network connection, and also run a packet sniffer on your server. The client will receive a disconnection of some sort from the server, and you can then check whether that corresponds to something that the server believes it sent, or not.

This is a kind of advanced technique, but, again, a useful one when a technically-oriented organization wants to pin down where in the network an error is coming from.

5 Likes

Oh, where have you gone? [this doesn't just happen]

Is anyone there skilled with Apache?
[I presume "too many cooks in the kitchen"!]

3 Likes

Totally agree @9peppe. That was a clever test (post #61) and is compelling evidence the acme challenges are blocked somewhere other than within poster's area.

It is too bad the site www.wmich.edu uses a cert from GlobalSign otherwise it would also soon be affected.

3 Likes

You mean the tests you made from devices within the univ network? As noted in earlier posts this actually further indicates the firewall impacting you is at the outer edge(s) of the university network connection.

Multiple Let's Encrypt servers will make challenges from various points around the globe and they must all succeed for a successful HTTP challenge. It does not matter that the machines running certbot (the acme client) can connect to that path, it is the LE servers that must be able to connect.

2 Likes

They're also sending an HSTS header with includeSubdomains, and I don't think they know what that implies for any website they do not control directly but under the same domain.

2 Likes

Alright after reverting to a place where we still had fullchain.pem this was our response back from the command:

apachectl -t -D DUMP_VHOSTS
[Mon Apr 11 08:13:55.741918 2022] [so:warn] [pid 51870] AH01574: module php7_module is already loaded, skipping
VirtualHost configuration:
*:80 wiki.ceas.wmich.edu (/etc/apache2/sites-enabled/000-default.conf:1)
*:443 wiki.ceas.wmich.edu (/etc/apache2/sites-enabled/default-ssl.conf:2)

UPDATE: Finally found a buried apache setting that was redirecting http to https. If we run curl on http://wiki.ceas.wmich.edu/.well-known/acme-challenge it not longer gives us a 301, gives us a 404 as we wanted. STILL getting a "connection reset by peer" error

Checked with the university Office of Information Technology and we are not blocking anything with our firewall. We tracked the packets with Wireshark. It seems our server receives the response from LetsEncrypt and sends nothing back. Not sure what to make of this.

1 Like

Maybe try the idea by webprofusion (post #55) of stopping Apache and running certbot in standalone mode. That would at least rule in/out Apache

Like

sudo certbot certonly --standalone -d wiki.ceas.wmich.edu --debug-challenge -v --dry-run
3 Likes

So sorry. Forgot to mention I tried this as well. Did not work. Pretty much all our servers have stopped renewing for an unknown reason and some of them use nginx for LetsEncrypt so I feel like it has to be some sort of firewall or some other higher up setting we are not recognizing

What didn't work exactly? Do you have a WireShark log of that attempt?

3 Likes

But I do not see any packet responsible for the "connection reset by peer" from that Wireshark log. Could you perhaps share the capture file?

2 Likes

Wouldn't the "connection reset by peer" be when we stop the connection where it says "Connection close\r\n"

No, "Connection: close" is a HTTP header responsible for telling the HTTP client what to do after the response has completed. See RFC 2616 - Hypertext Transfer Protocol -- HTTP/1.1 for more info.

The "connection reset by peer" is a TCP error, which is a different level than HTTP.

3 Likes

So should I actually see a "Connection Reset by Peer" error in Wireshark?

Not literally that error, but usually the Connection reset by peer error is caused by a "RST" package from the host to the client. If I run WireShark on my host and do a curl, I'm getting the following result:

No. Time Source Destination Protocol Length Info
35 19:26:03.122978656 192.168.178.21 141.218.145.131 TCP 74 49902 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=1673551658 TSecr=0 WS=128
36 19:26:03.327851502 141.218.145.131 192.168.178.21 TCP 66 80 → 49902 [SYN, ACK] Seq=0 Ack=1 Win=64240 Len=0 MSS=1460 SACK_PERM=1 WS=128
37 19:26:03.327908759 192.168.178.21 141.218.145.131 TCP 54 49902 → 80 [ACK] Seq=1 Ack=1 Win=64256 Len=0
38 19:26:03.328085738 192.168.178.21 141.218.145.131 HTTP 207 GET /.well-known/acme-challenge/4gsNLk776etwY1JQjkT_Rvj2mLSSzaSxDYKFL83ILN0 HTTP/1.1
39 19:26:03.532666975 141.218.145.131 192.168.178.21 TCP 56 80 → 49902 [RST, ACK] Seq=1 Ack=154 Win=64256 Len=0

Or graphically:

Notice the final package.

If that RST package did NOT originate from your host, it came from somewhere else in between.

3 Likes

That's because only adding the slash on the end results in the Connection reset by peer, without the slash it's NOT giving any error. Which is the whole reason why everybody here suspects a firewall somwehere in between.

3 Likes

And what was the curl output? Did you see the 404 file not found? From where are you running that command?

2 Likes

The curl output is:
HTTP/1.1 404 Not Found
Date: Mon, 11 Apr 2022 19:21:39 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Type: text/html; charset=iso-8859-1

I am running this from a windows machine. It IS on the same network as the server. Curling outside the network may change that. I can check later today when I have access to an non-University computer.

If you're getting a 404 not found result (when you indeed include the slash [or more] after acme-challenge) from within the same network as the server, but get a connection reset by peer from outside the network, you'll know where to look: somewhere in between your network and the location where curl gets the connection reset by peer.

2 Likes