Certbot, nginx, and systemd on Ubuntu 20.04.2

Unattended, it seems certbot is killing nginx as part of the renewal process, failing to start nginx, the renewal fails because the challenge doesn't work, and nginx remains dead.

journalctl for snap.certbot.renew.service:

Sep 09 19:26:01 tanager systemd[1]: Starting Service for snap application certbot.renew...
Sep 09 19:30:21 tanager certbot.renew[121392]: Failed to renew certificate www.campcomputer.com with error: Some challenges have failed.
Sep 09 19:30:21 tanager certbot.renew[121392]: The following renewals failed:
Sep 09 19:30:21 tanager certbot.renew[121392]:   /etc/letsencrypt/live/www.campcomputer.com/fullchain.pem (failure)
Sep 09 19:30:21 tanager certbot.renew[121392]: 1 renew failure(s), 0 parse failure(s)
Sep 09 19:30:21 tanager systemd[1]: snap.certbot.renew.service: Main process exited, code=exited, status=1/FAILURE
Sep 09 19:30:21 tanager systemd[1]: snap.certbot.renew.service: Failed with result 'exit-code'.
Sep 09 19:30:21 tanager systemd[1]: Failed to start Service for snap application certbot.renew.

journlctl for nginx.service:

Sep 09 19:30:19 tanager systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126706 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126707 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126706 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Killing process 126707 (nginx) with signal SIGKILL.
Sep 09 19:30:19 tanager systemd[1]: nginx.service: Failed with result 'core-dump'.
Sep 09 19:43:20 tanager systemd[1]: nginx.service: Unit cannot be reloaded because it is inactive.
Sep 09 19:43:29 tanager systemd[1]: Starting A high performance web server and a reverse proxy server...
Sep 09 19:43:29 tanager systemd[1]: Started A high performance web server and a reverse proxy server.

My alerting tells me the site is down and I login and restart nginx and the site comes back up.:

sudo systemctl restart nginx

I was previously on an earlier Ubuntu LTS running the non-snap certbot. I followed the certbot install instructions after upgrading to 20.04 and I believe fully uninstalled the obsolete certbot before these errors started.

I can't recall perfectly, but I regularly have to log in, manually kick certbot and nginx around with systemctl restarts, and eventually certificates renew and nginx is back in business, but it's certainly not something I enjoy doing.

Any idea what's going on and how to fix it?


Domain: campcomputer.com (among others)

My web server is (include version): nginx version: nginx/1.18.0 (Ubuntu)

The operating system my web server runs on is (include version): Ubuntu 20.04.2

My hosting provider, if applicable, is: linode VPS

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 1.19.0

1 Like

Seems cerbot just ran again, killing nginx, but at least this time it renewed successfully. I suspect this means the next time it runs, at least for a month, it won't kill nginx.

Sep 10 00:01:23 tanager systemd[1]: Starting Service for snap application certbot.renew...
Sep 10 00:05:43 tanager systemd[1]: snap.certbot.renew.service: Succeeded.
Sep 10 00:05:43 tanager systemd[1]: Finished Service for snap application certbot.renew.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470659 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470660 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470659 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Killing process 470660 (nginx) with signal SIGKILL.
Sep 10 00:05:43 tanager systemd[1]: nginx.service: Failed with result 'core-dump'.

Though I had to manually restart nginx again =/

1 Like

nginx hitting a segmentation fault is really, really bad news. It's unlikely to have anything to do with Certbot, except for the fact that Certbot tries to reload/restart nginx.

There are some previous threads which come to mind as being suspiciously similar. Take a look at this one and try the posted advice.

3 Likes

Could be my lack of knowledge of nginx as wel as systemd, but I'm not seeing a segfault in the logs. Can you infer that by the SIGKILL signals for probably child processes issued by the nginx service as response on a segfault somewhere else within the main nginx process (or perhaps a different child process)? Just curious here.

2 Likes

Sure, it's this:

2 Likes

Aaah yes, I'm just blind, apologies..

2 Likes

The segfault happens every time certbot runs. I suspect it's part of the certbot renew process.

How might I resolve the issue? I'm not familiar with diagnosing segfault outside the context of my own code.

1 Like

Avoid using --nginx for authentication.
I'd switch to --webroot authentication.
[and ensure you are not stopping/restarting nginx in the renewal process]

1 Like

@idontusenumbers
hmm...
If you don't use numbers...
What did you replace all the zeros and ones inside your computer with?
LOL

1 Like

The username actually originated with AIM; I was making fun of my friends with numbered suffixes on their screen names while trying to show the superiority of ICQ not requiring unique names. =)

How can I

ensure you are not stopping/restarting nginx in the renewal process

Do you mean verify in the logs it's no longer killing nginx?

Using --webroot might be rough because there are many domains on the system, each configured differently in nginx/apache.

2 Likes

Check the corresponding renewal.conf and the global /etc/letsencrypt/cli.ini file to ensure there are no integrated scripts to do so.

You only need to do it once per.
And you should have a couple of months to complete that.

OR
You could try a different version of certbot
OR
Another ACME client - like: acme.sh

1 Like

Did you try disabling the perl module in nginx, as described in the post I linked to earlier?

Certbot really does not do anything special with nginx besides reload it. I encourage you to try the workaround before abandoning --nginx or Certbot.

1 Like

Oh my, I didn't see the link; the word wrapping, colors, etc tricked my brain. Thanks for pointing it out again. I disabled the perl module, restarted nginx, and did a certbot --force-renew on a domain. I verified the cert was updated through Edge (by looking at the expiration date before and after I forced the renew). The renew seems to have worked and I didn't have to cleanup nginx after the renewal. I suspect this worked but it will be hard to tell until after a few auto renews happen. Thanks!

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.