Intermittent ConnectionRefusedError in CI pipeline when using Certbot

My domain is:
http://pki.example.com:8080/acme/directory (domain exists for CI purposes only) see Provide user friendly error message when trying to parse invalid JSON · dogtagpki/pki@f8ea9ad · GitHub

I ran this command:
docker exec client certbot register
--server http://pki.example.com:8080/acme/directory
--email user1@example.com
--agree-tos
--non-interactive

It produced this output:
On success:

2021-10-20 11:05:10,979:DEBUG:certbot._internal.main:certbot version: 1.20.0
2021-10-20 11:05:10,979:DEBUG:certbot._internal.main:Location of certbot entry point: /usr/bin/certbot
2021-10-20 11:05:10,980:DEBUG:certbot._internal.main:Arguments: ['--server', 'http://pki.example.com:8080/acme/directory', '--non-interactive']
2021-10-20 11:05:10,980:DEBUG:certbot._internal.main:Discovered plugins: PluginsRegistry(PluginEntryPoint#manual,PluginEntryPoint#null,PluginEntryPoint#standalone,PluginEntryPoint#webroot)
2021-10-20 11:05:10,993:DEBUG:certbot._internal.log:Root logging level set at 30
2021-10-20 11:05:11,007:DEBUG:acme.client:Sending GET request to http://pki.example.com:8080/acme/directory.
2021-10-20 11:05:11,008:DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pki.example.com:8080
2021-10-20 11:05:11,024:DEBUG:urllib3.connectionpool:http://pki.example.com:8080 "GET /acme/directory HTTP/1.1" 200 398
2021-10-20 11:05:11,025:DEBUG:acme.client:Received response:
HTTP 200
Content-Type: application/json
Content-Length: 398
Date: Wed, 20 Oct 2021 11:05:11 GMT
Keep-Alive: timeout=20
Connection: keep-alive

On fail:

2021-10-20 10:36:41,877:DEBUG:certbot._internal.main:certbot version: 1.20.0
2021-10-20 10:36:41,877:DEBUG:certbot._internal.main:Location of certbot entry point: /usr/bin/certbot
2021-10-20 10:36:41,877:DEBUG:certbot._internal.main:Arguments: ['--server', 'http://pki.example.com:8080/acme/directory', '--email', 'user1@example.com', '--agree-tos', '--non-interactive']
2021-10-20 10:36:41,878:DEBUG:certbot._internal.main:Discovered plugins: PluginsRegistry(PluginEntryPoint#manual,PluginEntryPoint#null,PluginEntryPoint#standalone,PluginEntryPoint#webroot)
2021-10-20 10:36:41,893:DEBUG:certbot._internal.log:Root logging level set at 30
2021-10-20 10:36:41,964:DEBUG:acme.client:Sending GET request to http://pki.example.com:8080/acme/directory.
2021-10-20 10:36:41,967:DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pki.example.com:8080
2021-10-20 10:36:41,977:DEBUG:certbot._internal.log:Exiting abnormally:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 159, in _new_conn
conn = connection.create_connection(
File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

i.e. When the args are all passed, it fails, but if some args are missing, it succeeds!

My web server is (include version):
Apache Tomcat/9.0.45

The operating system my web server runs on is (include version):
docker container of Fedora 34

My hosting provider, if applicable, is:
GitHub

I can login to a root shell on my machine (yes or no, or I don't know):
don't know

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
certbot version: 1.20.0

FWIW I can't tell if this is an issue with GitHub Actions, with Docker, or with Certbot - so I'm starting my investigation at the bottom and working my way up - any help appreciated, thanks!

1 Like

Well, the error message is very clear to me personally: "Connection refused" says the server isn't running on that port. So I think it's highly unlikely that this is an issue with certbot itself.

2 Likes

@ckelley That's interesting. I do not have anything specific.

I (also) cannot imagine how the different args you show could change the connection result. It is interesting though that one series works while the other, nearly identical, series succeeds.

I could only suggest replacing the certbot command in your "verify certbot" step in the github actions with a plain curl like:
curl -I http://pki.example.com:8080/acme/directory

That is similar to what is failing just using curl instead of python that certbot uses. If that also fails it clearly points to an environmental issue in the github setup.

2 Likes

Good idea of poking it directly with curl. I have done so:

0curl: (6) Could not resolve host: pki.example.com

so that reduces the problem space to the CI env or something funky with establishing the server in the container.

No issues with certbot then, thanks for the help!

2 Likes

How do you resolve pki.example.com (or whatever the real name is) ?
[I suspect that you might be using multiple DNS servers and some might know while some might not]

There's only about 10-15 seconds between creating your test ACME server and trying to connect, could your test server just be a little slow to start up? A literal race condition :slight_smile: maybe try injecting a 1 minute delay before running the client tests.

1 Like

Yeah I think you're right, I'm building in a timeout!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.