Connection dropping on Finalize

Hi,

This is an odd one, I have a user who is having problems on one server (Windows Server 2019, using Certify The Web) when performing ACME orders with Let's Encrypt. Normal communication with the API works fine until they get to the Finalize step and the connection abruptly drops.

System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host

Here is an excerpt from the debug log. You can see the POST to get the current status of the Order works fine:

2022-11-14 14:59:49.735 +01:00 [DBG] Http Request: Method: POST, RequestUri: 'https://acme-v02.api.letsencrypt.org/acme/order/801624272/142965326657', Version: 1.1, Content: System.Net.Http.StringContent, Headers:
{
  User-Agent: Certify/5.6.8.0
  User-Agent: (Windows; Microsoft Windows NT 10.0.17763.0)
  User-Agent: Certes/2.4.0.0
  User-Agent: .NET/4.0.30319.42000
  Content-Type: application/jose+json
}
---
---
2022-11-14 14:59:50.060 +01:00 [DBG] Http Response: StatusCode: 200, ReasonPhrase: 'OK', Version: 1.1, Content: System.Net.Http.StreamContent, Headers:
{
  Connection: keep-alive
  Link: <https://acme-v02.api.letsencrypt.org/directory>;rel="index"
  Replay-Nonce: 327CnFxulwQ9Z9s4qoZjV6pDp-n7_TtyO0l_2akSVR7bH7U
  X-Frame-Options: DENY
  Strict-Transport-Security: max-age=604800
  Cache-Control: public, no-cache, max-age=0
  Date: Mon, 14 Nov 2022 13:59:50 GMT
  Server: nginx
  Content-Length: 334
  Content-Type: application/json
}
2022-11-14 14:59:50.060 +01:00 [DBG] {
  "status": "ready",
  "expires": "2022-11-18T06:53:15Z",
  "identifiers": [
    {
      "type": "dns",
      "value": "torntwig.se"
    }
  ],
  "authorizations": [
    "https://acme-v02.api.letsencrypt.org/acme/authz-v3/174856301377"
  ],
  "finalize": "https://acme-v02.api.letsencrypt.org/acme/finalize/801624272/142965326657"
}

However the followup POST to /acme/finalize kills the connection at the transport level:

2022-11-14 14:59:50.075 +01:00 [DBG] Http Request: Method: POST, RequestUri: 'https://acme-v02.api.letsencrypt.org/acme/finalize/801624272/142965326657', Version: 1.1, Content: System.Net.Http.StringContent, Headers:
{
  User-Agent: Certify/5.6.8.0
  User-Agent: (Windows; Microsoft Windows NT 10.0.17763.0)
  User-Agent: Certes/2.4.0.0
  User-Agent: .NET/4.0.30319.42000
  Content-Type: application/jose+json
}
2022-11-14 14:59:50.075 +01:00 [DBG] {"protected":"eyJhbGciOiJFUzI1NiIsImtpZCI6Imh0dHBzOi8vYWNtZS12MDIuYXBpLmxldHNlbmNyeXB0Lm9yZy9hY21lL2FjY3QvODAxNjI0MjcyIiwibm9uY2UiOiIzMjdDbkZ4dWx3UTlaOXM0cW9aalY2cERwLW43X1R0eU8wbF8yYWtTVlI3Ykg3VSIsInVybCI6Imh0dHBzOi8vYWNtZS12MDIuYXBpLmxldHNlbmNyeXB0Lm9yZy9hY21lL2ZpbmFsaXplLzgwMTYyNDI3Mi8xNDI5NjUzMjY2NTcifQ","payload":"eyJjc3IiOiJNSUlDbkRDQ0FZUUNBUUF3RmpFVU1CSUdBMVVFQXd3TGRHOXliblIzYVdjdWMyVXdnZ0VpTUEwR0NTcUdTSWIzRFFFQkFRVUFBNElCRHdBd2dnRUtBb0lCQVFDLWpCdlVFVERMRkFITHh6X0RfTGVnaUxoTHc4cGF5Y2kyMDBEckFhcEpiNlIzeU5ILTJyMW9YZWJvYkpVOEN5Y0wwYXFuRFJvVFBFOTFLNlRNblBrZ015QVZQTUhyZm1sZ0tfbnlmOEZxbzBtQ2hYQUtJRl9DUTFFeVFtVGVVVE1FRjdxbnlxSDJGNW1DRGRWd284RHdRZ1k2MTNHTjBZRDdGMHQta0xnX0pqeXpvd3JEM09RZ3djSzItdk5MVjFGbTB0b09LTG4zNXczV3RSenF3cXBGY2daQzhMb3hlX2xzQlJwd2lES1hWNmZZU0ZEbWFCZVJaU0ZFbVR5bHFxT1BGYTZVSUxFNGRzNURNZXVhRFpaUE9GY0pWUDFoZTJ1eUNES05LUTFiZWtjQm5ZZnpnbnNEYmY4dTFrR3ZPcFhfZkpCaUFYcXdsT1YwQWlYSjA2b1BBZ01CQUFHZ1FUQV9CZ2txaGtpRzl3MEJDUTR4TWpBd01Ba0dBMVVkRXdRQ01BQXdDd1lEVlIwUEJBUURBZ1dnTUJZR0ExVWRFUVFQTUEyQ0MzUnZjbTUwZDJsbkxuTmxNQTBHQ1NxR1NJYjNEUUVCQ3dVQUE0SUJBUUJGQWNfSUtub1ExNV9ZbThjdTN0YnNIb1pyUVZTYUluX00zRTlmTHdvbWU3TnFXQzh5b1pvS1Z3Rkk3dVVmUTNsUnViZFVpWlpSb0traXNaZFR1VmpmWWtGMFJLT29LcGRGbUpfa0ltajhfTWp5SmNvaVRJRUpjSC0xaWhROEJHeEp6azJEWTlMVFVTX2x3Vjd1QTluUDVWdVpsMUFBaEE3TEo5cklRcHZRNEFHOVF3YkZDSWlFaDBNSVcwWUhiZlRiMWtHSXI1RDJpb1QtWkVYenZJR2s4Yk83em80YWRXWTdfWl9zbHVyMUxpeXltZEhwUGI3RmpvbW5GbTJKUnA0U2ZpdXBTOVo2MmVpcFRZNTctVTRuT3FrS05EODJnTmdDc0NkcGY0dU9WUVBnbWI2V1luREd2REhBVTFDbUhZZVRJN3JCR1dQRG0zMkVXMWhEeDBMdCJ9","signature":"[redacted]"}

There is no http error returned from the server, it's appears to be a failure at the TCP or TLS level:

2022-11-14 14:59:51.130 +01:00 [ERR] Certificate request process failed: System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.

Any thoughts :slight_smile: ?

The users says there is no firewall or malware preventing communication on their side.

Are there any known conditions where LE will just drop a connection other than when an IP is blocked? For instance, could it be a rate limit or some other kind of protection?

3 Likes

MiTM ?

3 Likes

They have other servers that work, so it could be specific to the domain they are trying to get a cert for, or some key problem, or perhaps something issuance related simply falls over.

4 Likes

Other FQDNs from that same server?
OR
Other servers?

In any case, I'd like to see their output of something like:
openssl s_client -connect acme-v02.api.letsencrypt.org:443
[or whatever the Windows equivalent is - LOL]

2 Likes

I don't have access to their server to run diagnostics myself but they communicate OK with the API up until the call to Finalize. They have TLS 1.2 etc.

I'll check with them regarding openssl and confirm again there is no outgoing proxy. I'll also check with them regarding Windows Updates as there was a recent bug regarding TLS Session Tickets (clutching at straws with that though!).

3 Likes

while we won't know the reason but can one create a now order with cached authz and test if new order fails to finalize??

3 Likes

As far as I'm aware the user has tried repeatedly and it always fails on Finalize.

4 Likes

My wild guess would be some kind of MTU issue, where a packet for that request is just over some limit that's making some router try to fragment the packet is a way some other router or firewall or the like doesn't accept. It might be helpful to see if they can get a packet capture on their router/firewall for the failure, but I'm not sure exactly what one would look for in it and it's a bit of a longshot.

And if there isn't, can they try adding one, just as a debugging step, to see if recurs when coming from a different server? Might also want to try via both IPv4 and IPv6 connectivity, assuming they're living near the year 2022 and have IPv6 available.

8 Likes

Thanks, I remember there was talk of something changing in validation on a similar topic, can't find the original thread though.

3 Likes

The changes, as I understand them, were for the validation connections from LE to the DNS/web server, so I wouldn't think they'd affect anything related to the API endpoint. But here's the announcements for the change, if it's helpful:

I think test-ipv6.com does an MTU/MSS test as part of its checks, Cloudflare has a v4 test and a v6 test, and I'm guessing there are more MTU test sites out there too that might help.

6 Likes

Thanks, the IPv4 test passed but I think they are probably going to give up and use a different CA, obviously we tried to troubleshoot this a bit before posting here so they'll probably be getting a little fatigued. Worth keeping an eye out for similar problems but I'm confident it's specific to Let's Encrypt service infrastructure.

4 Likes

I'd be interested in hearing if that resolves "this problem" (as yet not clearly defined).
Please keep us informed.

5 Likes

Yes, other CAs like ZeroSSL work fine. If I wasn't quite clear the specific problem is the underlying TCP connection dropping on POST to the /acme/finalize/ endpoint, obviously it only affects this user otherwise we'd know about it, but I thought it worth checking if anyone had encountered this before. To really dig into it we'd need wireshark captures etc and we're not going to go to that level in this instance. If it becomes a common occurrence then we will.

4 Likes

Does the "certify the web" open new HTTPS connection for each ACME API command, or does it cache the HTTPS connection for the later API calls?

3 Likes

@bruncsak Hi, it re-uses a single .net HttpClient instance per CA account, which in turn re-uses a pool of https connections but that's mostly handled by the .net framework internally.

[I should also add the user believes this has been working for a couple of years and only recently encountered the problem]

4 Likes

Then it may be a recently introduced bug with a software component upgrade. Hard to figure that out without packet capture. Or, at least without detailed trace log of the .net framework workings.

4 Likes