I have a very strange behaviour which I don't understand at all and I hope you can help me:
I am using certbot in a --standalone setup with a cronjob based renew script which has a --pre-hook and --post-hook to stop and start my services if it's really time to renew the certificates.
All this worked quite well for years now on my Ubuntu 20 LTS machines, but...
...not I updated from the old certbot-auto to the snap based certbot (v. 2.6.0) setup. I changed my executable from ./certbot-auto to /snap/bin/certbot and left everything else as it was.
It works quite well except for one really strange and blocking thing:
All processes (tomcat, proprietary executables) which are started by the --post-hook script (it's a bash script) are killed when the session that was executing the renew process (e.g. my SSH session when I tested it to find all this out) is closed.
How to reproduce:
Start my renewal, wait for pre-hook, wait for renewal, wait for post-hook... now everything works, tomcat is running... now close the SSH session (exit) and then... the tomcat process is killed.
I already tried out to call the post hook script with nohup/disown and & (background) but also this does not help.
Now I'm running out of ideas, do you have one? Thanks in advance.
I tried it, but it had no effect. And it's also strange that the behaviour changed from certbot-auto to the current version. Maybe i should give screen a try for testing purposes.
Inside my post-hook script tomcat and other services and executables are started regularly - not with any service daemon, e.g.:
tomcat/bin/startup.sh
And... maybe this helps to understand the circumstances... when I call this post-hook script manually from my terminal, everything is working normal. I can close the SSH session without any problems, everything is still running.
Also, why can't you use the web server with --webroot ?
Too complicated to explain. It's a very complicated environment consisting of serval services with NGINX, Tomcat and so on. The standalone way is the most stable we found out after testing over the years: All services are stopped, the cert is renewed, everything is starting up again. I had no problems with this since 2017 with the old certbot-auto.
I don't have any ideas for troubleshooting, as this sort of stuff is really hard to do without system access.
I do have a workaround though: install mosh, and use that instead of ssh for now. mosh will persist the connection across disconnects - it basically daemonizes your terminal session on the server and just runs an emulator locally.
This discussion is going in the wrong direction. I know that my scripts are not perfect and I also know that they should be started as services. Let's call them juvenile sins... I know better now, but I also don't want to change the script completely (due to other projects that are waiting) nevertheless it's strange that all started processes are killed. It's also not a problem of SSH, that was only an example, it also happens (and first occurred) when being started as a cron job. As soon as the cron job finishes the services that have been started are killed. Is there a way to "nohup" the whole post-hook or another workaround?
What are the differences between the "old" post-hook of certbot-auto and the one that is called by the current certbot?
So, let me understand, you want certbot to stay running to use that post-hook?
The easy way is to run everything inside screen and then detach (Ctrl-ad)
That's not going to work very well, tho. Let the system keep your services running (is Ubuntu 20 too early for systemd? Probably not.), and let certbot restart them.
I think you may want to use nfqueue authenticator so it can run in front of working webserver without turning it off by intercept challenge message in firewall. you won't need post hook to restart webserver with this
Thank you for this input. It brought me to the idea that (as also @rg305 suggested) the -webroot may indeed an option for me. I could redirect the /.well-known/acme-challenge/* folder via NGINX to a real path and thus the post-hook would only need to perform a nginx -s reload to inform NGINX about the updated certificate.
Use deploy-hook instead of post-hook. The former only runs when the certificate(s) actually have been renewed, while the latter runs after each certbot invocation, which is pretty suboptimal.
You don't need to have certbot do the nginx reload in a hook. You could just set up a normal cron job to run once a day to do that. Sure, most of the time the daily reload won't do much but it is not disruptive either.
@Nekit Right, if I don't need the pre-hook anymore I can also switch to the deploy-hook as a nice optimisation. Up to know the combo of pre and post was necessary, since I stopped some services in the pre-hook which than needed to be restarted with the post-hook (also on unsuccessful renewals).
@MikeMcQ I think it's cleaner to to the NGINX reload in the deploy-hook since -as you say - it is otherwise performed every day, but necessary only every second month.