As I checked the letsencrypt settings with the command "sudo certbot renew –dry-run"
I received for 2-4 out of 28 certifivates the output:
Failed to renew certificate FAILEDDOMAIN.TLD with error:
HTTPSConnectionPool(host='acme-staging-v02.api.letsencrypt.org', port=443):
Max retries exceeded with url: /acme/cert/2b02404155dc830f9010c89f84281ebe8ab7
(Caused by SSLError(SSLError(1,
'[SSL: SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2635)')))
Web server: Apache/2.4.52 (Ubuntu)
Operating system web server: VPS, Ubuntu 22.04.4 LTS
Server provider: Strato
Login to a root shell: Yes
No control panel
certbot 2.11.0
Note: The certificates with error messages vary randomly from run to run! It therefore has nothing to do with the renew settings.
immediately afterwards, I get always a success message. All 28 certificats are currently valid. But the behaviour of the simulation seems rather strange to me.
ping -4 acme-staging-v02.api.letsencrypt.org
PING (172.65.46.172) 56(84) bytes of data.
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=1 ttl=59 time=1.22 ms
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=2 ttl=59 time=1.19 ms
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=3 ttl=59 time=1.09 ms
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=4 ttl=59 time=1.08 ms
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=5 ttl=59 time=1.40 ms
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=6 ttl=59 time=1.17 ms
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=7 ttl=59 time=1.14 ms
64 bytes from 172.65.46.172 (172.65.46.172): icmp_seq=8 ttl=59 time=1.25 ms
^C
--- ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7010ms
rtt min/avg/max/mdev = 1.075/1.192/1.399/0.095 ms
I also performed a longer run:
--- ping statistics ---
207 packets transmitted, 207 received, 0% packet loss, time 206249ms
rtt min/avg/max/mdev = 1.027/1.165/1.751/0.086 ms
"Failed to renew certificate software-theband.com with error: HTTPSConnectionPool(host='acme-staging-v02.api.letsencrypt.org', port=443): Max retries exceeded with url: /directory (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2635)')))"
Note that the certificates actually affected by the error vary randomly. They are different for each run!
It seems to be that the error message ‘SSLV3_ALERT_BAD_RECORD_MAC’ indicates a problem with the integrity of the data transferred between the client and the server. This error message occurs if the Message Authentication Code (MAC) of an SSL data record is invalid.
So I did some more SSL testing:
We store tar.gz backup archives on a different node every night.
I downloaded an archive of size 1.3GB. There were no errors. By the way, we use SFTP extensivley without any problems.
Maybe you could uninstall certbot, check that no other versions remain, then reinstall certbot.
OR
Try using another ACME client [like: acme.sh] to see if the problem persists.
This issue could be caused by filesystem corruption or hardware problems, difficult to tell. Perhaps you could do a packet capture (pcap) with tcpdump of a failed renew so that we can look if there's an obvious problem?
Yeah, if it's intermittent and inconsistent it may be some networking hardware or firewall or the like corrupting or dropping packets or something along those lines.
Let's Encrypt does not terminate TLS at Cloudflare (they use E2EE via Cloudflare Spectrum) and does not have 0-RTT enabled. The issue you linked is related to a (partially incompatible) TLS middlebox, and so far we haven't seen evidence that a middlebox is actually involved. Given that this is a VPS running at Strato, it's unlikely as they don't employ middleboxes.
snap version of certbot: 2.11.0 which seems to be the latest. I have read that the Snap release comes with its own Python libraries.
openssl is: "openssl/jammy-updates,jammy-security,now 3.0.2-0ubuntu1.18 amd64" Index of /changelogs/pool/main/o/openssl
This seems to be the latest for Ubuntu 22.04.4 LTS
A random memory error seems unlikely to me, as there should be more errors than just in the case of ‘certbot renew --dry-run’. The last command currently always returns the specified error, which would indicate a frequently occurring memory error.
The randomness of the error event does not really seem to me to indicate that the SSD memory is defective.
Now I did a tcpdump during ‘certbot renew --dry-run’ and stopped the recording after the first error occurred: