Process started by --post-hook commands killed after closing SSH session?!

I have a very strange behaviour which I don't understand at all and I hope you can help me:

I am using certbot in a --standalone setup with a cronjob based renew script which has a --pre-hook and --post-hook to stop and start my services if it's really time to renew the certificates.

All this worked quite well for years now on my Ubuntu 20 LTS machines, but...

...not I updated from the old certbot-auto to the snap based certbot (v. 2.6.0) setup. I changed my executable from ./certbot-auto to /snap/bin/certbot and left everything else as it was.

It works quite well except for one really strange and blocking thing:

All processes (tomcat, proprietary executables) which are started by the --post-hook script (it's a bash script) are killed when the session that was executing the renew process (e.g. my SSH session when I tested it to find all this out) is closed. :man_shrugging:

How to reproduce:

Start my renewal, wait for pre-hook, wait for renewal, wait for post-hook... now everything works, tomcat is running... now close the SSH session (exit) and then... the tomcat process is killed. :scream:

I already tried out to call the post hook script with nohup/disown and & (background) but also this does not help.

Now I'm running out of ideas, do you have one? Thanks in advance.

Hello @afiller, welcome to the Let's Encrypt community. :slightly_smiling_face:

Try using the nohup - Wikipedia command so that the terminal session can end without kill off all its children processes.

I tried it, but it had no effect. And it's also strange that the behaviour changed from certbot-auto to the current version. Maybe i should give screen a try for testing purposes.

How is the tomcat process normally started?

Also, why can't you use the web server with --webroot?

2 Likes

Inside my post-hook script tomcat and other services and executables are started regularly - not with any service daemon, e.g.:

tomcat/bin/startup.sh

And... maybe this helps to understand the circumstances... when I call this post-hook script manually from my terminal, everything is working normal. I can close the SSH session without any problems, everything is still running.

Also, why can't you use the web server with --webroot ?

Too complicated to explain. It's a very complicated environment consisting of serval services with NGINX, Tomcat and so on. The standalone way is the most stable we found out after testing over the years: All services are stopped, the cert is renewed, everything is starting up again. I had no problems with this since 2017 with the old certbot-auto.

I don't have any ideas for troubleshooting, as this sort of stuff is really hard to do without system access.

I do have a workaround though: install mosh, and use that instead of ssh for now. mosh will persist the connection across disconnects - it basically daemonizes your terminal session on the server and just runs an emulator locally.

Those should all be daemonized in that script.

3 Likes

This discussion is going in the wrong direction. :rofl: I know that my scripts are not perfect and I also know that they should be started as services. Let's call them juvenile sins... I know better now, but I also don't want to change the script completely (due to other projects that are waiting) nevertheless it's strange that all started processes are killed. It's also not a problem of SSH, that was only an example, it also happens (and first occurred) when being started as a cron job. As soon as the cron job finishes the services that have been started are killed. Is there a way to "nohup" the whole post-hook or another workaround?

What are the differences between the "old" post-hook of certbot-auto and the one that is called by the current certbot?

2 Likes

So, let me understand, you want certbot to stay running to use that post-hook?

The easy way is to run everything inside screen and then detach (Ctrl-a d)

That's not going to work very well, tho. Let the system keep your services running (is Ubuntu 20 too early for systemd? Probably not.), and let certbot restart them.

3 Likes

I think you may want to use nfqueue authenticator so it can run in front of working webserver without turning it off by intercept challenge message in firewall. you won't need post hook to restart webserver with this

4 Likes

Thank you for this input. It brought me to the idea that (as also @rg305 suggested) the -webroot may indeed an option for me. I could redirect the /.well-known/acme-challenge/* folder via NGINX to a real path and thus the post-hook would only need to perform a nginx -s reload to inform NGINX about the updated certificate.

I'll try and report back.

2 Likes

Use deploy-hook instead of post-hook. The former only runs when the certificate(s) actually have been renewed, while the latter runs after each certbot invocation, which is pretty suboptimal.

3 Likes

You don't need to have certbot do the nginx reload in a hook. You could just set up a normal cron job to run once a day to do that. Sure, most of the time the daily reload won't do much but it is not disruptive either.

2 Likes

@Nekit Right, if I don't need the pre-hook anymore I can also switch to the deploy-hook as a nice optimisation. Up to know the combo of pre and post was necessary, since I stopped some services in the pre-hook which than needed to be restarted with the post-hook (also on unsuccessful renewals).

@MikeMcQ I think it's cleaner to to the NGINX reload in the deploy-hook since -as you say - it is otherwise performed every day, but necessary only every second month. :slight_smile:

2 Likes

So... thank you for all your input! :slightly_smiling_face:

I finally resolved it by using webroot and a deploy-hook.

For the webroot I created a HTTP exception for NGINX:

	server {
		listen 80 default_server;
		listen [::]:80 default_server;
		server_name _;

		location /.well-known/acme-challenge {
			limit_req zone=ip burst=20 nodelay;
			limit_req_log_level warn;
			limit_req_status 503;
			root /web-server/certs/web;
		}

		location / {
			return 301 https://$host$request_uri;
		}
	}

The deploy hook is quite simple: nginx -s reload

I just tested it and it works. :partying_face: So I was able to bury an ugly part of old script history. :rofl:

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.