Certbot, nginx, and systemd on Ubuntu 20.04.2

idontusenumbers · September 10, 2021, 4:41am

Unattended, it seems certbot is killing nginx as part of the renewal process, failing to start nginx, the renewal fails because the challenge doesn't work, and nginx remains dead.

journalctl for snap.certbot.renew.service:

Sep 09 19:26:01 tanager systemd[1]: Starting Service for snap application certbot.renew...
Sep 09 19:30:21 tanager certbot.renew[121392]: Failed to renew certificate www.campcomputer.com with error: Some challenges have failed.
Sep 09 19:30:21 tanager certbot.renew[121392]: The following renewals failed:
Sep 09 19:30:21 tanager certbot.renew[121392]:   /etc/letsencrypt/live/www.campcomputer.com/fullchain.pem (failure)
Sep 09 19:30:21 tanager certbot.renew[121392]: 1 renew failure(s), 0 parse failure(s)
Sep 09 19:30:21 tanager systemd[1]: snap.certbot.renew.service: Main process exited, code=exited, status=1/FAILURE
Sep 09 19:30:21 tanager systemd[1]: snap.certbot.renew.service: Failed with result 'exit-code'.
Sep 09 19:30:21 tanager systemd[1]: Failed to start Service for snap application certbot.renew.

journlctl for nginx.service:

Sep 09 19:30:19 tanager systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126706 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126707 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126706 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126707 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Failed with result 'core-dump'.
Sep 09 19:43:20 tanager systemd[1]: nginx.service: Unit cannot be reloaded because it is inactive.
Sep 09 19:43:29 tanager systemd[1]: Starting A high performance web server and a reverse proxy server...
Sep 09 19:43:29 tanager systemd[1]: Started A high performance web server and a reverse proxy server.

My alerting tells me the site is down and I login and restart nginx and the site comes back up.:

sudo systemctl restart nginx

I was previously on an earlier Ubuntu LTS running the non-snap certbot. I followed the certbot install instructions after upgrading to 20.04 and I believe fully uninstalled the obsolete certbot before these errors started.

I can't recall perfectly, but I regularly have to log in, manually kick certbot and nginx around with systemctl restarts, and eventually certificates renew and nginx is back in business, but it's certainly not something I enjoy doing.

Any idea what's going on and how to fix it?

Domain: campcomputer.com (among others)

My web server is (include version): nginx version: nginx/1.18.0 (Ubuntu)

The operating system my web server runs on is (include version): Ubuntu 20.04.2

My hosting provider, if applicable, is: linode VPS

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 1.19.0

idontusenumbers · September 10, 2021, 5:33am

Seems cerbot just ran again, killing nginx, but at least this time it renewed successfully. I suspect this means the next time it runs, at least for a month, it won't kill nginx.

Sep 10 00:01:23 tanager systemd[1]: Starting Service for snap application certbot.renew...
Sep 10 00:05:43 tanager systemd[1]: snap.certbot.renew.service: Succeeded.
Sep 10 00:05:43 tanager systemd[1]: Finished Service for snap application certbot.renew.

Sep 10 00:05:43 tanager systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470659 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470660 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470659 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470660 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Failed with result 'core-dump'.

Though I had to manually restart nginx again =/

_az · September 10, 2021, 6:52am

nginx hitting a segmentation fault is really, really bad news. It's unlikely to have anything to do with Certbot, except for the fact that Certbot tries to reload/restart nginx.

There are some previous threads which come to mind as being suspiciously similar. Take a look at this one and try the posted advice.

Osiris · September 10, 2021, 7:40am

Could be my lack of knowledge of nginx as wel as systemd, but I'm not seeing a segfault in the logs. Can you infer that by the SIGKILL signals for probably child processes issued by the nginx service as response on a segfault somewhere else within the main nginx process (or perhaps a different child process)? Just curious here.

_az · September 10, 2021, 7:57am

Sure, it's this:

Osiris · September 10, 2021, 8:15am

Aaah yes, I'm just blind, apologies..

idontusenumbers · September 10, 2021, 3:59pm

The segfault happens every time certbot runs. I suspect it's part of the certbot renew process.

How might I resolve the issue? I'm not familiar with diagnosing segfault outside the context of my own code.

rg305 · September 10, 2021, 4:24pm

Avoid using --nginx for authentication.
I'd switch to --webroot authentication.
[and ensure you are not stopping/restarting nginx in the renewal process]

rg305 · September 10, 2021, 4:31pm

@idontusenumbers
hmm...
If you don't use numbers...
What did you replace all the zeros and ones inside your computer with?
LOL

idontusenumbers · September 10, 2021, 5:49pm

The username actually originated with AIM; I was making fun of my friends with numbered suffixes on their screen names while trying to show the superiority of ICQ not requiring unique names. =)

How can I

ensure you are not stopping/restarting nginx in the renewal process

Do you mean verify in the logs it's no longer killing nginx?

Using --webroot might be rough because there are many domains on the system, each configured differently in nginx/apache.

rg305 · September 10, 2021, 6:01pm

Check the corresponding renewal.conf and the global /etc/letsencrypt/cli.ini file to ensure there are no integrated scripts to do so.

You only need to do it once per.
And you should have a couple of months to complete that.

OR
You could try a different version of certbot
OR
Another ACME client - like: acme.sh

_az · September 10, 2021, 8:05pm

Did you try disabling the perl module in nginx, as described in the post I linked to earlier?

Certbot really does not do anything special with nginx besides reload it. I encourage you to try the workaround before abandoning --nginx or Certbot.

idontusenumbers · September 10, 2021, 8:32pm

Oh my, I didn't see the link; the word wrapping, colors, etc tricked my brain. Thanks for pointing it out again. I disabled the perl module, restarted nginx, and did a certbot --force-renew on a domain. I verified the cert was updated through Edge (by looking at the expiration date before and after I forced the renew). The renew seems to have worked and I didn't have to cleanup nginx after the renewal. I suspect this worked but it will be hard to tell until after a few auto renews happen. Thanks!

system · October 10, 2021, 8:33pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"certbot renew" improperly restarts nginx Help	5	12845	April 11, 2018
Certbot crashes Nginx while renewing certificates Client dev	5	3810	June 29, 2018
Issue renewing certificate Help	3	1375	January 8, 2020
Auto Renewal: NGINX pid disappears, NGINX does't restart Help	8	1978	July 22, 2022
Renewal attempts failed, obtaining failed unless nginx restart Help	4	495	July 14, 2023

Certbot, nginx, and systemd on Ubuntu 20.04.2

Related topics