I had an issue in production with my Certbot renewal process.
When I re-ran the same renewal process again as a --dry-run, it succeeded, so it could be a fluke, but the failure took down the server, so I want to review this issue I had here in case anyone might see any red flags.
I’m using certbot 0.19.0 on Ubuntu 16.04.3, with Nginx as a web server.
Issue: When I ran the Certbot renewal process with hooks to stop and start Nginx, and there were multiple cert renewals pending, the Certbot process hung up after the last renewal.
(After I manually recovered and restarted the server, I ran the process again, and it renewed the last single remaining cert and finished OK.)
Here is the command I use:
/root/certbot/certbot-auto renew --pre-hook “systemctl stop nginx” --post-hook “systemctl start nginx”
Certbot’s call to the pre-hook worked, Nginx stopped, and the the first few renewals worked as well.
Then, after the last renewal, the certbot process hung… Nginx was not running. Ctrl-C would not exit the certbot process. I waited several minutes before using ctrl-\ to exit out.
Here is the Certbot command output, then the errors I saw in the Nginx log, then the LetsEncrypt log:
Processing /etc/letsencrypt/renewal/v2.app.mydomain.com.conf
Cert is due for renewal, auto-renewing…
Plugins selected: Authenticator standalone, Installer None
Running pre-hook command: systemctl stop nginx
Renewing an existing certificate
Performing the following challenges:
tls-sni-01 challenge for v2.app.mydomain.com
Waiting for verification…
Cleaning up challenges
new certificate deployed without reload, fullchain is
/etc/letsencrypt/live/v2.app.mydomain.com/fullchain.pem
Processing /etc/letsencrypt/renewal/v2.tv.mydomain.com.conf
Cert not yet due for renewal
Processing /etc/letsencrypt/renewal/league.mydomain.com.conf
Cert is due for renewal, auto-renewing…
Plugins selected: Authenticator standalone, Installer None
Pre-hook command already run, skipping: systemctl stop nginx
Renewing an existing certificate
Performing the following challenges:
tls-sni-01 challenge for league.mydomain.com
Waiting for verification…
Cleaning up challenges
new certificate deployed without reload, fullchain is
/etc/letsencrypt/live/league.mydomain.com/fullchain.pem
Processing /etc/letsencrypt/renewal/api.mydomain.com.conf
Cert is due for renewal, auto-renewing…
Plugins selected: Authenticator standalone, Installer None
Pre-hook command already run, skipping: systemctl stop nginx
Renewing an existing certificate
Performing the following challenges:
tls-sni-01 challenge for api.mydomain.com
Waiting for verification…
Cleaning up challenges
[Here is where it was stuck just hanging]
The last api.mydomain.com cert was NOT renewed at this point. Nginx was shut down.
Here is what I saw in the Nginx error log:
2017/12/01 11:56:22 [emerg] 27565#27565: bind() to 0.0.0.0:443 failed (98: Address already in use)
2017/12/01 11:56:22 [emerg] 27565#27565: bind() to 0.0.0.0:443 failed (98: Address already in use)
2017/12/01 11:56:22 [emerg] 27565#27565: bind() to 0.0.0.0:443 failed (98: Address already in use)
2017/12/01 11:56:22 [emerg] 27565#27565: bind() to 0.0.0.0:443 failed (98: Address already in use)
2017/12/01 11:56:22 [emerg] 27565#27565: bind() to 0.0.0.0:443 failed (98: Address already in use)
2017/12/01 11:56:22 [emerg] 27565#27565: still could not bind()
This is the last few lines of the letsencrypt log:
2017-12-01 17:17:54,812:DEBUG:certbot.storage:Writing chain to /etc/letsencrypt/archive/api.dartconnect.com/chain2.pem.
2017-12-01 17:17:54,812:DEBUG:certbot.storage:Writing full chain to /etc/letsencrypt/archive/api.dartconnect.com/fullchain2.pem.
2017-12-01 17:17:56,806:DEBUG:certbot.storage:Writing new config /etc/letsencrypt/renewal/api.dartconnect.com.conf.new.
2017-12-01 17:17:56,809:INFO:certbot.hooks:Running deploy-hook command: systemctl start nginx
2017-12-01 17:17:56,867:DEBUG:certbot.renewal:no renewal failures
Not sure if there is anywhere else I should look, or precautions I should take. I am now nervous about scheduling this renewal process, since this iisue would cause a full system failure on the next renewal if it were to happen again.
Thanks so much for any clues, hints, or suggestions!
-Sean