Curl: TCP connection reset by peer

Hey there,
since a few Days we are getting random “Curl: TCP connection reset by peer” Errors when trying to renew Certs. We got our own PHP Client integrated into our CMS. (Based on https://github.com/analogic/lescript)
A second attempt a few Seconds later succeeds without Problems …
Requests sent from Servers located in Austria. (ie. from 83.65.246.198)
As there are several thousand Certs hosted on this Server there are lot’s of renewals per Day … 99,9% succeed … but some trigger this Error … not a big Deal as we automatically try again later … but just wondering …
No Network Issues observed on our End / our Datacenter.
So could there be an Issue on LE side? Anything known? Others getting such Errors too?
thx, bye from Austria
Andreas Schnederle-Wagner

Hi @futureweb

there was a change:

Letsencrypt supports now http/2.

1 Like

Hey @JuergenAuer,
thx for the Information - seems like it’s related to this … saw this Error the first Time ever on 25.09 02:22, since then it happend 14 Times on some dozen/hundreds of renewals …
Happens just randomly …

grafik

Can I do something to help debugging this Issue?

FWIW, someone else had a similar report a few days ago:

https://community.letsencrypt.org/t/curl-error-to-directory-endpoint/103027

1 Like

Some more Information regarding this Issue:

Using ACME V1 Endpoint: https://acme-v01.api.letsencrypt.org
Host OS: CentOS Linux release 7.6.1810 (Core)
IPv4 - no IPv6
#curl -V
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.36 zlib/1.2.7 libidn/1.28 libssh2/1.4.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets
Also tested manually compiled current Version of CURL … triggered the same Error.

Possible LE new CDN can’t hold up with all the incoming requests and start denying new connections?
Someone on the LE Infrastructure Side who could have a look into this? (@cpu ? ;-))

thx, bye from sunny Austria
Andreas

Isn't your curl too old? That's from

7.29.0 Feb 6 2013

https://curl.haxx.se/docs/releases.html

@JuergenAuer - using Centos 7 which is Binary “Clone” of RHEL 7. those Enterprise Linuxes tend to stay on old but good tested Versions for Compatibility/Stability reasons. But all important Security Fixes get backported by them to those old Versions (see: https://access.redhat.com/solutions/64838)
So that’s “normal” in this case :wink:
Newer Versions coming with Centos 8 which was released just a few Days ago … :slight_smile:

I also tried it on manually compiled 7.65 Version which throwed the same Error - so I’m more or less ruling out CURL as the Source of the Problem :wink:

1 Like

I will ask someone on the SRE team to investigate.

3 Likes

thx a lot ... hope they can help track it down! :wink:

1 Like

Hi @futureweb -

Thanks for bringing this to our attention. I’ve started investigating this internally and will provide any updates here.

Is there any more detail you can provide from the error messages? Perhaps, running the command with curl -vvv to see more details.

5 Likes

I’m seeing about 1k ECONNRESET type errors per hours. We’re provisioning around 25k certificates per day. We are still getting a lot of successful requests.

Last one was at Oct 2nd, 17:26:10 UTC. We’re requesting from Google Cloud.

3 Likes

Just a quick update. I’ve been reviewing logs and data and I’m getting close to a root cause. Thanks for your patience while we continue investigating.

5 Likes

@jillian was out of Office yesterday, but as you are getting close to the root cause I guess you don’t need any more verbose Logs anymore?
If we can do anything further to help you hunting down this nasty thing just drop me a line! :wink:

The problem doesn’t seem to only happen with curl: I’ve been having errors from my daily certbot cron for the last 3 days:

(timezone in the log below is UTC)

2019-10-05 00:45:09,509:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-05 00:45:25,065:DEBUG:certbot.log:Exiting abnormally:
An unexpected error occurred:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 417, in wrap_socket
    cnx.do_handshake()
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1426, in do_handshake
    self._raise_ssl_error(self._ssl, result)
  File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1166, in _raise_ssl_error
    raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 594, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 350, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 837, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 323, in connect
    ssl_context=context)
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 424, in wrap_socket
    raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 624, in urlopen
    raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)

During handling of the above exception, another exception occurred:

requests.exceptions.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)
Please see the logfiles in /var/log/letsencrypt for more details.

Same errors happened at the following timestamps (still UTC):

2019-10-03 00:45:08,350:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-03 00:45:23,950:DEBUG:certbot.log:Exiting abnormally:
<same error as above>
2019-10-04 00:45:09,660:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-04 00:45:25,413:DEBUG:certbot.log:Exiting abnormally
<same error as above>
3 Likes

@jillian - got some more Errors over the Weekend - here the Info:
Source IP: 83.65.246.198, Timezone: MESZ (UTC+2)

Date		Time		v1 Endpoint		Error
06.10.2019	14:35		/acme/new-authz		TCP connection reset by peer
07.10.2019	00:19		/directory		TCP connection reset by peer
07.10.2019	03:19		/acme/new-authz		TCP connection reset by peer

Date		Time		v2 Endpoint		Error
07.10.2019	10:44		/acme/new-acct		TCP connection reset by peer
07.10.2019	13:35		/acme/new-nonce		TCP connection reset by peer
07.10.2019	16:24		/directory		TCP connection reset by peer
07.10.2019	23:35		/acme/new-acct		TCP connection reset by peer
08.10.2019	03:15		/acme/new-nonce		TCP connection reset by peer
08.10.2019	04:57		/acme/chall-v3/681374162/Utss7g		TCP connection reset by peer
08.10.2019	08:36		/acme/authz-v3/683535874		TCP connection reset by peer
08.10.2019	08:57		/acme/chall-v3/683734935/dB4U8A		TCP connection reset by peer
08.10.2019	09:17		/acme/new-acct		TCP connection reset by peer
08.10.2019	12:23		/acme/authz-v3/685829208		TCP connection reset by peer
08.10.2019	13:01		/acme/new-nonce		TCP connection reset by peer
08.10.2019	13:10		/directory		TCP connection reset by peer
08.10.2019	14:07		/acme/new-nonce		TCP connection reset by peer
08.10.2019	14:40		/directory		TCP connection reset by peer
08.10.2019	15:13		/acme/authz-v3/687506321		TCP connection reset by peer
08.10.2019	15:16		/acme/new-nonce		TCP connection reset by peer
08.10.2019	17:08		/acme/new-order		TCP connection reset by peer
08.10.2019	17:15		/acme/new-nonce		TCP connection reset by peer
08.10.2019	18:09		/acme/chall-v3/689037489/Ij7g3g		TCP connection reset by peer
08.10.2019	19:35		/acme/new-order		TCP connection reset by peer
08.10.2019	23:05		/acme/new-nonce		TCP connection reset by peer
09.10.2019	00:25		/directory		TCP connection reset by peer
09.10.2019	00:32		/acme/new-acct		TCP connection reset by peer
09.10.2019	00:34		/acme/chall-v3/692785848/7JJlFw		TCP connection reset by peer
09.10.2019	03:01		/directory		TCP connection reset by peer
09.10.2019	04:50		/acme/new-order		TCP connection reset by peer
09.10.2019	06:33		/acme/chall-v3/696489572/WgutoA		TCP connection reset by peer
09.10.2019	08:55		/acme/new-acct		TCP connection reset by peer
09.10.2019	09:00		/directory		TCP connection reset by peer
09.10.2019	09:40		/acme/authz-v3/698269473		TCP connection reset by peer
09.10.2019	12:20		/acme/new-acct		TCP connection reset by peer
09.10.2019	12:54		/acme/new-nonce		TCP connection reset by peer
09.10.2019	14:56		/acme/chall-v3/701278204/O1uALw		TCP connection reset by peer
09.10.2019	15:30		/acme/new-order		TCP connection reset by peer
09.10.2019	19:40		/acme/new-nonce		TCP connection reset by peer

I’ll update posting as new Errors occur … :wink:

2 Likes

FWIW, I am also seeing this error when using an Ansible playbook to update. In our case, we have a proxy server involved, so the logs would be… messy. I’ll see what I can get into a useful format.

1 Like

We are getting following error in some 1-2% of renewals since couple of days. Not sure if this is the same error:

Attempting to renew cert (www.somedomain.sk) from /etc/letsencrypt/renewal/www.somedomain.sk.conf produced an unexpected error: HTTPSConnectionPool(host='acme-v02.api.letsencrypt.org', port=443): Max retries exceeded with url: /directory (Caused by SSLError(SSLError("bad handshake: SysCallError(104, 'ECONNRESET')",),)). Skipping. All renewal attempts failed. The following certs could not be renewed: /etc/letsencrypt/live/www.somedomain.sk/fullchain.pem (failure)

1 Like

Just had a quick look about how many % of the Requests fail …

On 08.10.2019 we had 70 Requests in total, 17 of them failed … so we get those failures at about 1/4 of all Requests … more than I initially thought … :-/

1 Like

Thank you everyone for all details and information! It’s really appreciated and helpful for us to know more of the scope. We’re still investigating a root cause as a high priority issues.

3 Likes

Hi folks!

Thanks for all the help troubleshooting and patience!

We have found a couple places that were causing at least some errors and fixed them.

  • Added additional frontend proxy capacity.
  • Lowered our frontend proxy keepalive timeout to be less than our firewall session timeout.

Let us know if you are still seeing problems.

7 Likes