Curl: TCP connection reset by peer

Isn’t your curl too old? That’s from

7.29.0 Feb 6 2013

https://curl.haxx.se/docs/releases.html

@JuergenAuer - using Centos 7 which is Binary “Clone” of RHEL 7. those Enterprise Linuxes tend to stay on old but good tested Versions for Compatibility/Stability reasons. But all important Security Fixes get backported by them to those old Versions (see: https://access.redhat.com/solutions/64838)
So that’s “normal” in this case :wink:
Newer Versions coming with Centos 8 which was released just a few Days ago … :slight_smile:

I also tried it on manually compiled 7.65 Version which throwed the same Error - so I’m more or less ruling out CURL as the Source of the Problem :wink:

1 Like

I will add that same problem (random connection errors) occurs when using cURL built with http2 support (nghttp2 lib).

curl 7.65.1 (x86_64-pc-linux-gnu) libcurl/7.65.1 NSS/3.44 zlib/1.2.7 brotli/1.0.7 nghttp2/1.31.1
Release-Date: 2019-06-05
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS brotli HTTP2 HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL

Already tried default version of cURL on CentOS 6/7 systems, newer 7.65.1 with and without http2 support. Anything I/we can do for tracking down the issue? Thanks.

1 Like

I will ask someone on the SRE team to investigate.

3 Likes

thx a lot … hope they can help track it down! :wink:

1 Like

Hi @futureweb -

Thanks for bringing this to our attention. I’ve started investigating this internally and will provide any updates here.

Is there any more detail you can provide from the error messages? Perhaps, running the command with curl -vvv to see more details.

4 Likes

I’m seeing about 1k ECONNRESET type errors per hours. We’re provisioning around 25k certificates per day. We are still getting a lot of successful requests.

Last one was at Oct 2nd, 17:26:10 UTC. We’re requesting from Google Cloud.

3 Likes

Just a quick update. I’ve been reviewing logs and data and I’m getting close to a root cause. Thanks for your patience while we continue investigating.

4 Likes

@jillian was out of Office yesterday, but as you are getting close to the root cause I guess you don’t need any more verbose Logs anymore?
If we can do anything further to help you hunting down this nasty thing just drop me a line! :wink:

The problem doesn’t seem to only happen with curl: I’ve been having errors from my daily certbot cron for the last 3 days:

(timezone in the log below is UTC)

2019-10-05 00:45:09,509:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-05 00:45:25,065:DEBUG:certbot.log:Exiting abnormally:
An unexpected error occurred:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 417, in wrap_socket
    cnx.do_handshake()
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1426, in do_handshake
    self._raise_ssl_error(self._ssl, result)
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1166, in _raise_ssl_error
    raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 594, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 350, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 837, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 323, in connect
    ssl_context=context)
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 424, in wrap_socket
    raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 624, in urlopen
    raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)

During handling of the above exception, another exception occurred:

requests.exceptions.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)
Please see the logfiles in /var/log/letsencrypt for more details.

Same errors happened at the following timestamps (still UTC):

2019-10-03 00:45:08,350:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-03 00:45:23,950:DEBUG:certbot.log:Exiting abnormally:
<same error as above>
2019-10-04 00:45:09,660:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-04 00:45:25,413:DEBUG:certbot.log:Exiting abnormally
<same error as above>
3 Likes

@jillian - got some more Errors over the Weekend - here the Info:
Source IP: 83.65.246.198, Timezone: MESZ (UTC+2)

Date		Time		v1 Endpoint		Error
06.10.2019	14:35		/acme/new-authz		TCP connection reset by peer
07.10.2019	00:19		/directory		TCP connection reset by peer
07.10.2019	03:19		/acme/new-authz		TCP connection reset by peer

Date		Time		v2 Endpoint		Error
07.10.2019	10:44		/acme/new-acct		TCP connection reset by peer
07.10.2019	13:35		/acme/new-nonce		TCP connection reset by peer
07.10.2019	16:24		/directory		TCP connection reset by peer
07.10.2019	23:35		/acme/new-acct		TCP connection reset by peer
08.10.2019	03:15		/acme/new-nonce		TCP connection reset by peer
08.10.2019	04:57		/acme/chall-v3/681374162/Utss7g		TCP connection reset by peer
08.10.2019	08:36		/acme/authz-v3/683535874		TCP connection reset by peer
08.10.2019	08:57		/acme/chall-v3/683734935/dB4U8A		TCP connection reset by peer
08.10.2019	09:17		/acme/new-acct		TCP connection reset by peer
08.10.2019	12:23		/acme/authz-v3/685829208		TCP connection reset by peer
08.10.2019	13:01		/acme/new-nonce		TCP connection reset by peer
08.10.2019	13:10		/directory		TCP connection reset by peer
08.10.2019	14:07		/acme/new-nonce		TCP connection reset by peer
08.10.2019	14:40		/directory		TCP connection reset by peer
08.10.2019	15:13		/acme/authz-v3/687506321		TCP connection reset by peer
08.10.2019	15:16		/acme/new-nonce		TCP connection reset by peer
08.10.2019	17:08		/acme/new-order		TCP connection reset by peer
08.10.2019	17:15		/acme/new-nonce		TCP connection reset by peer
08.10.2019	18:09		/acme/chall-v3/689037489/Ij7g3g		TCP connection reset by peer
08.10.2019	19:35		/acme/new-order		TCP connection reset by peer
08.10.2019	23:05		/acme/new-nonce		TCP connection reset by peer
09.10.2019	00:25		/directory		TCP connection reset by peer
09.10.2019	00:32		/acme/new-acct		TCP connection reset by peer
09.10.2019	00:34		/acme/chall-v3/692785848/7JJlFw		TCP connection reset by peer
09.10.2019	03:01		/directory		TCP connection reset by peer
09.10.2019	04:50		/acme/new-order		TCP connection reset by peer
09.10.2019	06:33		/acme/chall-v3/696489572/WgutoA		TCP connection reset by peer
09.10.2019	08:55		/acme/new-acct		TCP connection reset by peer
09.10.2019	09:00		/directory		TCP connection reset by peer
09.10.2019	09:40		/acme/authz-v3/698269473		TCP connection reset by peer
09.10.2019	12:20		/acme/new-acct		TCP connection reset by peer
09.10.2019	12:54		/acme/new-nonce		TCP connection reset by peer
09.10.2019	14:56		/acme/chall-v3/701278204/O1uALw		TCP connection reset by peer
09.10.2019	15:30		/acme/new-order		TCP connection reset by peer
09.10.2019	19:40		/acme/new-nonce		TCP connection reset by peer

I’ll update posting as new Errors occur … :wink:

2 Likes

FWIW, I am also seeing this error when using an Ansible playbook to update. In our case, we have a proxy server involved, so the logs would be… messy. I’ll see what I can get into a useful format.

1 Like

We are getting following error in some 1-2% of renewals since couple of days. Not sure if this is the same error:

Attempting to renew cert (www.somedomain.sk) from /etc/letsencrypt/renewal/www.somedomain.sk.conf produced an unexpected error: HTTPSConnectionPool(host=‘acme-v02.api.letsencrypt.org’, port=443): Max retries exceeded with url: /directory (Caused by SSLError(SSLError(“bad handshake: SysCallError(104, ‘ECONNRESET’)”,),)). Skipping. All renewal attempts failed. The following certs could not be renewed: /etc/letsencrypt/live/www.somedomain.sk/fullchain.pem (failure)

1 Like

Just had a quick look about how many % of the Requests fail …

On 08.10.2019 we had 70 Requests in total, 17 of them failed … so we get those failures at about 1/4 of all Requests … more than I initially thought … :-/

1 Like

Thank you everyone for all details and information! It’s really appreciated and helpful for us to know more of the scope. We’re still investigating a root cause as a high priority issues.

2 Likes

Hi folks!

Thanks for all the help troubleshooting and patience!

We have found a couple places that were causing at least some errors and fixed them.

  • Added additional frontend proxy capacity.
  • Lowered our frontend proxy keepalive timeout to be less than our firewall session timeout.

Let us know if you are still seeing problems.

6 Likes

Hey @andygabby,
thx for the Feedback & hopefully the Solution to this Problem! :wink:
Today no Errors so far … if Errors occur I will report back!
thx, bye from sunny Austria
Andreas

Update 10.10. - 18:05 (MESZ - UTC+2): Still no Errors so far
Update 11.10. - 09:52 (MESZ - UTC+2): Still no Errors so far
Update 14.10. - 23:19 (MESZ - UTC+2): Still no Errors so far

3 Likes

3 posts were split to a new topic: SSL Handshake failure

I was able to workaround this problem by adding a bigger random delay to our lets encrypt handlers, so the cron jobs move away from the full hour mark more.

I’m sure you want to fix this on your side too, but maybe something to think about for others to ease the burden on your infrastructure.

1 Like

@mhaecker - our renewals happen completely random, 1.440 Minutes a Day :wink:
Didn’t notice any big correlation with “heavy load” times (See Error Log here: Curl: TCP connection reset by peer)
But since the Changes LE made the Errors are gone, not a single one on our Side … so I guess they got the Prob sorted! :wink: