Hi All,
I have 2 servers running nginx 1.10.3 with certbots ‘0.19.0’ and ‘0.21.1’. Cert renewal always fail on both of them with an error: ‘Job for nginx.service failed because the control process exited with error code. See “systemctl status nginx.service” and “journalctl -xe” for details.’
On both the services, accordingly to their logs, certbot does its job updating certs. But it always fails to restart nginx. I tried using certbot with and without pre- and post-hooks, no difference.
What is I see in the logs is:
- certbot stops nginx using ‘service nginx stop’;
- updates certs;
- some mistery here
- tries to start nginx using ‘service nginx start’, and fails here.
Nginx logs show that ports 80 and 443 were already bound. It looks like at step 3 something starts nginx using ‘nginx -c /etc/nginx/nginx.conf’ when the proper command that is used by the service is ‘/usr/sbin/nginx -g daemon on; master_process on;’. There is nothing in the logs that would show what and when it was started. I just see that pids of the weird nginx are in between the pids of the stopped nginx and the one that failed to start by certbot on the post-hook event.
Manually killing the weird instance of nginx with ‘killall nginx’ and starting it normally as ‘service start nginx’ fixes things.
Any ideas of what can be wrong there?
Thanks
P.S. My logs
===== Renewal output
ubuntu:~$ sudo certbot renew --force-renewal
Saving debug log to /var/log/letsencrypt/letsencrypt.log
-------------------------------------------------------------------------------
Processing /etc/letsencrypt/renewal/www.foo-bar.com.conf
-------------------------------------------------------------------------------
Plugins selected: Authenticator nginx, Installer nginx
Starting new HTTPS connection (1): acme-v01.api.letsencrypt.org
Running pre-hook command: service nginx stop
Renewing an existing certificate
Performing the following challenges:
tls-sni-01 challenge for www.foo-bar.com
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
Waiting for verification...
Cleaning up challenges
-------------------------------------------------------------------------------
new certificate deployed with reload of nginx server; fullchain is
/etc/letsencrypt/live/www.foo-bar.com/fullchain.pem
-------------------------------------------------------------------------------
The following certs were successfully renewed:
/etc/letsencrypt/live/www.foo-bar.com/fullchain.pem (success)
-------------------------------------------------------------------------------
Running post-hook command: service nginx start
Hook command "service nginx start" returned error code 1
Error output from service:
Job for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details.
1 renew failure(s), 0 parse failure(s)
===== Journalctl output
ubuntu:~$ journalctl -xe
Mar 11 18:31:08 ip sudo[12112]: ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/certbot renew --force-renewal
Mar 11 18:31:08 ip sudo[12112]: pam_unix(sudo:session): session opened for user root by ubuntu(uid=0)
Mar 11 18:31:10 ip systemd[1]: Stopping A high performance web server and a reverse proxy server...
-- Subject: Unit nginx.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nginx.service has begun shutting down.
Mar 11 18:31:10 ip systemd[1]: Stopped A high performance web server and a reverse proxy server.
-- Subject: Unit nginx.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nginx.service has finished shutting down.
Mar 11 18:31:18 ip systemd[1]: Starting A high performance web server and a reverse proxy server...
-- Subject: Unit nginx.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nginx.service has begun starting up.
Mar 11 18:31:18 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Mar 11 18:31:18 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
...
Mar 11 18:31:20 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Mar 11 18:31:20 ip nginx[12202]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Mar 11 18:31:21 ip nginx[12202]: nginx: [emerg] still could not bind()
Mar 11 18:31:21 ip systemd[1]: nginx.service: Control process exited, code=exited status=1
Mar 11 18:31:21 ip systemd[1]: Failed to start A high performance web server and a reverse proxy server.
-- Subject: Unit nginx.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nginx.service has failed.
--
-- The result is failed.
Mar 11 18:31:21 ip systemd[1]: nginx.service: Unit entered failed state.
Mar 11 18:31:21 ip systemd[1]: nginx.service: Failed with result 'exit-code'.
====== Weird leftover nginx
ubuntu:~$ ps aux | grep nginx
root 2461 0.0 0.0 126124 1468 ? Ss 19:14 0:00 nginx: master process nginx -c /etc/nginx/nginx.conf
www-data 2462 0.0 0.3 126628 6520 ? S 19:14 0:00 nginx: worker process
====== Properly started nginx
ubuntu:~$ killall nginx
ubuntu:~$ sudo service nginx start
ubuntu:~$ ps aux | grep nginx
root 4996 0.0 0.0 126128 1468 ? Ss 19:33 0:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data 4997 0.0 0.1 126484 3264 ? S 19:33 0:00 nginx: worker process