Hey there,
since a few Days we are getting random “Curl: TCP connection reset by peer” Errors when trying to renew Certs. We got our own PHP Client integrated into our CMS. (Based on https://github.com/analogic/lescript)
A second attempt a few Seconds later succeeds without Problems …
Requests sent from Servers located in Austria. (ie. from 83.65.246.198)
As there are several thousand Certs hosted on this Server there are lot’s of renewals per Day … 99,9% succeed … but some trigger this Error … not a big Deal as we automatically try again later … but just wondering …
No Network Issues observed on our End / our Datacenter.
So could there be an Issue on LE side? Anything known? Others getting such Errors too?
thx, bye from Austria
Andreas Schnederle-Wagner
Hey @JuergenAuer,
thx for the Information - seems like it’s related to this … saw this Error the first Time ever on 25.09 02:22, since then it happend 14 Times on some dozen/hundreds of renewals …
Happens just randomly …
Can I do something to help debugging this Issue?
FWIW, someone else had a similar report a few days ago:
https://community.letsencrypt.org/t/curl-error-to-directory-endpoint/103027
Some more Information regarding this Issue:
Using ACME V1 Endpoint: https://acme-v01.api.letsencrypt.org
Host OS: CentOS Linux release 7.6.1810 (Core)
IPv4 - no IPv6
#curl -V
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.36 zlib/1.2.7 libidn/1.28 libssh2/1.4.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets
Also tested manually compiled current Version of CURL … triggered the same Error.
Possible LE new CDN can’t hold up with all the incoming requests and start denying new connections?
Someone on the LE Infrastructure Side who could have a look into this? (@cpu ? ;-))
thx, bye from sunny Austria
Andreas
@JuergenAuer - using Centos 7 which is Binary “Clone” of RHEL 7. those Enterprise Linuxes tend to stay on old but good tested Versions for Compatibility/Stability reasons. But all important Security Fixes get backported by them to those old Versions (see: https://access.redhat.com/solutions/64838)
So that’s “normal” in this case
Newer Versions coming with Centos 8 which was released just a few Days ago …
I also tried it on manually compiled 7.65 Version which throwed the same Error - so I’m more or less ruling out CURL as the Source of the Problem
I will ask someone on the SRE team to investigate.
thx a lot ... hope they can help track it down!
Hi @futureweb -
Thanks for bringing this to our attention. I’ve started investigating this internally and will provide any updates here.
Is there any more detail you can provide from the error messages? Perhaps, running the command with curl -vvv
to see more details.
I’m seeing about 1k ECONNRESET type errors per hours. We’re provisioning around 25k certificates per day. We are still getting a lot of successful requests.
Last one was at Oct 2nd, 17:26:10 UTC
. We’re requesting from Google Cloud.
Just a quick update. I’ve been reviewing logs and data and I’m getting close to a root cause. Thanks for your patience while we continue investigating.
@jillian was out of Office yesterday, but as you are getting close to the root cause I guess you don’t need any more verbose Logs anymore?
If we can do anything further to help you hunting down this nasty thing just drop me a line!
The problem doesn’t seem to only happen with curl: I’ve been having errors from my daily certbot cron for the last 3 days:
(timezone in the log below is UTC)
2019-10-05 00:45:09,509:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-05 00:45:25,065:DEBUG:certbot.log:Exiting abnormally:
An unexpected error occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 417, in wrap_socket
cnx.do_handshake()
File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1426, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "/usr/lib/python3/dist-packages/OpenSSL/SSL.py", line 1166, in _raise_ssl_error
raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 594, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 350, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 837, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 323, in connect
ssl_context=context)
File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3/dist-packages/urllib3/contrib/pyopenssl.py", line 424, in wrap_socket
raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 423, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 624, in urlopen
raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)
During handling of the above exception, another exception occurred:
requests.exceptions.SSLError: ("bad handshake: SysCallError(104, 'ECONNRESET')",)
Please see the logfiles in /var/log/letsencrypt for more details.
Same errors happened at the following timestamps (still UTC):
2019-10-03 00:45:08,350:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-03 00:45:23,950:DEBUG:certbot.log:Exiting abnormally:
<same error as above>
2019-10-04 00:45:09,660:DEBUG:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
2019-10-04 00:45:25,413:DEBUG:certbot.log:Exiting abnormally
<same error as above>
@jillian - got some more Errors over the Weekend - here the Info:
Source IP: 83.65.246.198, Timezone: MESZ (UTC+2)
Date Time v1 Endpoint Error
06.10.2019 14:35 /acme/new-authz TCP connection reset by peer
07.10.2019 00:19 /directory TCP connection reset by peer
07.10.2019 03:19 /acme/new-authz TCP connection reset by peer
Date Time v2 Endpoint Error
07.10.2019 10:44 /acme/new-acct TCP connection reset by peer
07.10.2019 13:35 /acme/new-nonce TCP connection reset by peer
07.10.2019 16:24 /directory TCP connection reset by peer
07.10.2019 23:35 /acme/new-acct TCP connection reset by peer
08.10.2019 03:15 /acme/new-nonce TCP connection reset by peer
08.10.2019 04:57 /acme/chall-v3/681374162/Utss7g TCP connection reset by peer
08.10.2019 08:36 /acme/authz-v3/683535874 TCP connection reset by peer
08.10.2019 08:57 /acme/chall-v3/683734935/dB4U8A TCP connection reset by peer
08.10.2019 09:17 /acme/new-acct TCP connection reset by peer
08.10.2019 12:23 /acme/authz-v3/685829208 TCP connection reset by peer
08.10.2019 13:01 /acme/new-nonce TCP connection reset by peer
08.10.2019 13:10 /directory TCP connection reset by peer
08.10.2019 14:07 /acme/new-nonce TCP connection reset by peer
08.10.2019 14:40 /directory TCP connection reset by peer
08.10.2019 15:13 /acme/authz-v3/687506321 TCP connection reset by peer
08.10.2019 15:16 /acme/new-nonce TCP connection reset by peer
08.10.2019 17:08 /acme/new-order TCP connection reset by peer
08.10.2019 17:15 /acme/new-nonce TCP connection reset by peer
08.10.2019 18:09 /acme/chall-v3/689037489/Ij7g3g TCP connection reset by peer
08.10.2019 19:35 /acme/new-order TCP connection reset by peer
08.10.2019 23:05 /acme/new-nonce TCP connection reset by peer
09.10.2019 00:25 /directory TCP connection reset by peer
09.10.2019 00:32 /acme/new-acct TCP connection reset by peer
09.10.2019 00:34 /acme/chall-v3/692785848/7JJlFw TCP connection reset by peer
09.10.2019 03:01 /directory TCP connection reset by peer
09.10.2019 04:50 /acme/new-order TCP connection reset by peer
09.10.2019 06:33 /acme/chall-v3/696489572/WgutoA TCP connection reset by peer
09.10.2019 08:55 /acme/new-acct TCP connection reset by peer
09.10.2019 09:00 /directory TCP connection reset by peer
09.10.2019 09:40 /acme/authz-v3/698269473 TCP connection reset by peer
09.10.2019 12:20 /acme/new-acct TCP connection reset by peer
09.10.2019 12:54 /acme/new-nonce TCP connection reset by peer
09.10.2019 14:56 /acme/chall-v3/701278204/O1uALw TCP connection reset by peer
09.10.2019 15:30 /acme/new-order TCP connection reset by peer
09.10.2019 19:40 /acme/new-nonce TCP connection reset by peer
I’ll update posting as new Errors occur …
FWIW, I am also seeing this error when using an Ansible playbook to update. In our case, we have a proxy server involved, so the logs would be… messy. I’ll see what I can get into a useful format.
We are getting following error in some 1-2% of renewals since couple of days. Not sure if this is the same error:
Attempting to renew cert (www.somedomain.sk) from /etc/letsencrypt/renewal/www.somedomain.sk.conf produced an unexpected error: HTTPSConnectionPool(host='acme-v02.api.letsencrypt.org', port=443): Max retries exceeded with url: /directory (Caused by SSLError(SSLError("bad handshake: SysCallError(104, 'ECONNRESET')",),)). Skipping. All renewal attempts failed. The following certs could not be renewed: /etc/letsencrypt/live/www.somedomain.sk/fullchain.pem (failure)
Just had a quick look about how many % of the Requests fail …
On 08.10.2019 we had 70 Requests in total, 17 of them failed … so we get those failures at about 1/4 of all Requests … more than I initially thought … :-/
Thank you everyone for all details and information! It’s really appreciated and helpful for us to know more of the scope. We’re still investigating a root cause as a high priority issues.
Hi folks!
Thanks for all the help troubleshooting and patience!
We have found a couple places that were causing at least some errors and fixed them.
- Added additional frontend proxy capacity.
- Lowered our frontend proxy keepalive timeout to be less than our firewall session timeout.
Let us know if you are still seeing problems.