Our Client
We issue hundreds of challenges per day on hundreds of domains, using the ancient certbot client known as letsencrypt
, ACME V1. We typically put 70 to 100 domains in a single SAN.
We are beginning our process of upgrading to ACME V2 (latest certbot) right now.
Quick Context
Given our extreme number of domains, networking or DNS issues are inevitable. Our system watches for error messages from letsencrypt
like the following:
urn:acme:error:dns :: DNS problem: NXDOMAIN looking up A for my.domain.com - check that a DNS record exists for this domain
This format of colon-separated messages allows our system to reliably parse the cause of failure and take highly specialized error handling action based on various failures, such as removing a problematic domain from the attempt.
The Problem
Within the last couple months, we began to intermittently get this error daily, on any given cert attempt:
SSLError: EOF occurred in violation of protocol (_ssl.c:590)
Simply retrying resolved. Running the exact same command with the same domains will succeed. It would seem this problem is not within our client.
It doesn’t give information we can write useful error handling for (such as removing the problematic domain from the domain list). This makes me wonder if the issue is a deeper, uncaught/unhandled error in your system.
The Questions
- What does this error mean? What might it indicate?
- Should we expect to continue getting this after upgrading to ACME v2 certbot? We won’t be using DNS challenges.
Special note: We have the exact same problem with another intemittent error: The server experienced an internal error :: Failed to get registration by key
. Again, retrying solves.