"certbot renew" improperly restarts nginx

Hi All,

I have 2 servers running nginx 1.10.3 with certbots ‘0.19.0’ and ‘0.21.1’. Cert renewal always fail on both of them with an error: ‘Job for nginx.service failed because the control process exited with error code. See “systemctl status nginx.service” and “journalctl -xe” for details.’

On both the services, accordingly to their logs, certbot does its job updating certs. But it always fails to restart nginx. I tried using certbot with and without pre- and post-hooks, no difference.

What is I see in the logs is:

  1. certbot stops nginx using ‘service nginx stop’;
  2. updates certs;
  3. some mistery here
  4. tries to start nginx using ‘service nginx start’, and fails here.

Nginx logs show that ports 80 and 443 were already bound. It looks like at step 3 something starts nginx using ‘nginx -c /etc/nginx/nginx.conf’ when the proper command that is used by the service is ‘/usr/sbin/nginx -g daemon on; master_process on;’. There is nothing in the logs that would show what and when it was started. I just see that pids of the weird nginx are in between the pids of the stopped nginx and the one that failed to start by certbot on the post-hook event.

Manually killing the weird instance of nginx with ‘killall nginx’ and starting it normally as ‘service start nginx’ fixes things.

Any ideas of what can be wrong there?

Thanks

P.S. My logs

===== Renewal output

ubuntu:~$ sudo certbot renew --force-renewal

Saving debug log to /var/log/letsencrypt/letsencrypt.log

-------------------------------------------------------------------------------
Processing /etc/letsencrypt/renewal/www.foo-bar.com.conf
-------------------------------------------------------------------------------
Plugins selected: Authenticator nginx, Installer nginx
Starting new HTTPS connection (1): acme-v01.api.letsencrypt.org
Running pre-hook command: service nginx stop
Renewing an existing certificate
Performing the following challenges:
tls-sni-01 challenge for www.foo-bar.com
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
Waiting for verification...
Cleaning up challenges

-------------------------------------------------------------------------------
new certificate deployed with reload of nginx server; fullchain is
/etc/letsencrypt/live/www.foo-bar.com/fullchain.pem
-------------------------------------------------------------------------------

The following certs were successfully renewed:
  /etc/letsencrypt/live/www.foo-bar.com/fullchain.pem (success)
-------------------------------------------------------------------------------
Running post-hook command: service nginx start
Hook command "service nginx start" returned error code 1
Error output from service:
Job for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details.

1 renew failure(s), 0 parse failure(s)


===== Journalctl output

ubuntu:~$ journalctl -xe

Mar 11 18:31:08 ip sudo[12112]:   ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/certbot renew --force-renewal
Mar 11 18:31:08 ip sudo[12112]: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Mar 11 18:31:10 ip systemd[1]: Stopping A high performance web server and a reverse proxy server...
-- Subject: Unit nginx.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit nginx.service has begun shutting down.
Mar 11 18:31:10 ip systemd[1]: Stopped A high performance web server and a reverse proxy server.

-- Subject: Unit nginx.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit nginx.service has finished shutting down.
Mar 11 18:31:18 ip systemd[1]: Starting A high performance web server and a reverse proxy server...
-- Subject: Unit nginx.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit nginx.service has begun starting up.
Mar 11 18:31:18 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Mar 11 18:31:18 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
...
Mar 11 18:31:20 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Mar 11 18:31:20 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Mar 11 18:31:21 ip nginx[12202]: nginx: [emerg] still could not bind()
Mar 11 18:31:21 ip systemd[1]: nginx.service: Control process exited, code=exited status=1
Mar 11 18:31:21 ip systemd[1]: Failed to start A high performance web server and a reverse proxy server.
-- Subject: Unit nginx.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit nginx.service has failed.
-- 
-- The result is failed.
Mar 11 18:31:21 ip systemd[1]: nginx.service: Unit entered failed state.
Mar 11 18:31:21 ip systemd[1]: nginx.service: Failed with result 'exit-code'.


====== Weird leftover nginx

ubuntu:~$ ps aux | grep nginx
root      2461  0.0  0.0 126124  1468 ?        Ss   19:14   0:00 nginx: master process nginx -c /etc/nginx/nginx.conf
www-data  2462  0.0  0.3 126628  6520 ?        S    19:14   0:00 nginx: worker process


====== Properly started nginx

ubuntu:~$ killall nginx
ubuntu:~$ sudo service nginx start
ubuntu:~$ ps aux | grep nginx
root      4996  0.0  0.0 126128  1468 ?        Ss   19:33   0:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data  4997  0.0  0.1 126484  3264 ?        S    19:33   0:00 nginx: worker process

The only way Certbot would restart nginx is if you told it to on the command line the first time you issued that certificate, because it definitely doesn't do it by default.

Running pre-hook command: service nginx stop

If you look in /etc/letsencrypt/renewal/www.foo-bar.com.conf, you will probably see the pre-hook and post-hook properties present.

You may have picked up that method of running Certbot from this post:

but it's no longer necessary.

right, and the pre- and post- hooks are there

post_hook = service nginx start
pre_hook = service nginx stop

I don’t mind having them there. I even like them.

The problem I mentioned was that something else starts nginx (not using ‘service nginx start’ but ‘nginx -c /etc/nginx/nginx.conf’) before the post-hook run, what causes certbot to fail on post-hook and complain with an error.

My apologies, I skimmed the post too quickly.

It does look like this Certbot bug, which relates to how the nginx certificate installer works while nginx is stopped, on some Linux distributions:

If Certbot's Nginx plugin has to start Nginx, it does so by using the nginx command directly rather than going through systemctl or service. In the couple systems I looked at where service doesn't invoke systemctl, everything should still work fine

A potential workaround appears to be to comment out the installer in the .conf file.

Again, you shouldn't need to stop nginx unless you are using the standalone authenticator on port 80, so avoiding stopping nginx at all may still be the optimal solution.

I see, lemme play with that.
Thanks a lot!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.