Bug: installing with certbot impedes further nginx conf changes without reboot

NP
I just thought that might show something that might be currently overlooked.
Almost as if... nginx is being started with a specific pid file path somewhere and differently elsewhere.

2 Likes

Ok. That info was very helpful @dvo The key to the problem is that nginx fails with a segment violation (segv) at 09:25:34. Sadly, I do not know the cause of that but I continue to believe it has something to do with your nginx packaging. I did not fully research the process.txt you provided but maybe someone else would notice something.

Here is a timeline of key events.

09:20:17 service nginx status (systemd) shows:
         Main PID: 1339
         Others: worker:1342, passenger core:1329, passenger watchdog:1326
09:25:29 certbot started (per LE log)
09:25:34 systemd nginx.service status=11/SEGV main process exited (per service nginx status)
         kills pid 1443 - but why that one? that pid not seen in ps display or service status just before certbot)
09:25:37 certbot reload fails due to missing pidfile (per LE log)
         was: nginx -c /etc/nginx/nginx.conf -s reload (per certbot code)
         pidfile missing as systemd deleted nginx.pid as result of SEGV cleanup 
         After a failed reload certbot tries to start nginx directly
         certbot issues: nginx -c /etc/nginx/nginx.conf for that
         (this is known by looking at configurator.py certbot code - start not shown in LE log)
09:25:xx commands after certbot complete
         /run/nginx.pid timestamp matches 09:25:37 direct start of nginx by certbot (not with systemd)
         ps -eF display also shows 09:25 start time
         Main PID 1478, worker:1501
         confirms certbot direct start now in effect
         ps -eF grep not setup to show passenger so their status not known
09:33:44 service nginx restart fails
         this is expected since last nginx start was direct, not with systemd
09:34??  next commands after restart fails
         /run/nginx.pid not found 
         as expected since systemd removed it after failed restart at 09:33:44
         nginx -s reload fails due to missing /run/nginx.pid

As _az noted, certbot starts nginx directly using a command like nginx -c /etc/nginx/nginx.conf. Mixing a direct start with systemd causes problems as described earlier.

A very puzzling item is at 09:25:34 systemd killed pid 1443 while certbot was running. I do not understand why that pid was killed. It was not shown in any prior displays even the ones right before certbot started. It was 3 seconds before certbot started nginx directly which got a pid of 1478.

I saw no evidence that mixed pidfile locations were a problem nor problems with paths to certbot itself or its config. Although, I wouldn't mind seeing results of this:

echo $PATH
which -a nginx

I don't expect any surprises but ...

Unless someone sees a problem in your process.txt packaging I think your better way forward is to use certbot --webroot and avoid the nginx plug-in. This means having to setup the ssl definitions yourself but it seems you could do this in your default template once. You would be able to reload / restart nginx using systemd and avoid some problems. Also, certbot would not be modifying your nginx.conf on the fly so less likely to cause integration problems. That's what I have so far. Hope this helps.

1 Like
/home/jerdvo/.rbenv/plugins/ruby-build/bin:/home/jerdvo/.rbenv/shims:/home/jerdvo/.rbenv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

$ which -a nginx
/usr/sbin/nginx
/sbin/nginx

Thank you, the situation is clarified.
Until we know what certbot touches improperly and that gets fixed, certbot --webroot is the only useful possibility in production mode.
rebooting could be acceptable in development - which also provides the syntax for .conf files.

We know that certbot might start nginx directly which causes problems in systemd distros like yours (and mine :)). And, _az promised to look at that.

But, this "feels" to me like an inherent issue related to certbot modifying your nginx.conf files while nginx is running. Does passenger or any other monitoring system auto-restart nginx when it detects a change to nginx.conf? Something like that would explain all the facts - especially the troubling unknown pid in the systemd segv error message.

If such an auto-restart is happening certbot --webroot (or a different acme client) is your only option as certbot --nginx will always update the nginx.conf files - for new issuance and renew.

2 Likes
1 Like

@rg305

grep: /etc/shadow: Permission denied
grep: /etc/ufw/before6.rules: Permission denied
grep: /etc/ufw/after.init: Permission denied
grep: /etc/ufw/user.rules: Permission denied
grep: /etc/ufw/after.rules: Permission denied
grep: /etc/ufw/after6.rules: Permission denied
grep: /etc/ufw/before.rules: Permission denied
grep: /etc/ufw/before.init: Permission denied
grep: /etc/ufw/user6.rules: Permission denied
grep: /etc/gshadow-: Permission denied
grep: /etc/ssh/ssh_host_ed25519_key: Permission denied
grep: /etc/ssh/ssh_host_rsa_key: Permission denied
grep: /etc/ssh/ssh_host_ecdsa_key: Permission denied
grep: /etc/ssh/ssh_host_dsa_key: Permission denied
grep: /etc/gshadow: Permission denied
grep: /etc/polkit-1/localauthority: Permission denied
/etc/rc2.d/S01nginx:# Try to extract nginx pidfile
/etc/rc2.d/S01nginx:    PID=/run/nginx.pid
/etc/rc4.d/S01nginx:# Try to extract nginx pidfile
/etc/rc4.d/S01nginx:    PID=/run/nginx.pid
grep: /etc/iscsi/iscsid.conf: Permission denied
grep: /etc/iscsi/initiatorname.iscsi: Permission denied
grep: /etc/letsencrypt/keys: Permission denied
grep: /etc/letsencrypt/archive: Permission denied
grep: /etc/letsencrypt/live: Permission denied
grep: /etc/letsencrypt/accounts: Permission denied
grep: /etc/redis/redis.conf: Permission denied
grep: /etc/sudoers: Permission denied
grep: /etc/ssl/private: Permission denied
/etc/init.d/nginx:# Try to extract nginx pidfile
/etc/init.d/nginx:      PID=/run/nginx.pid
grep: /etc/at.deny: Permission denied
/etc/rc1.d/K01nginx:# Try to extract nginx pidfile
/etc/rc1.d/K01nginx:    PID=/run/nginx.pid
grep: /etc/.pwd.lock: Permission denied
/etc/nginx/nginx.conf:pid /run/nginx.pid;
grep: /etc/security/opasswd: Permission denied
/etc/rc6.d/K01nginx:# Try to extract nginx pidfile
/etc/rc6.d/K01nginx:    PID=/run/nginx.pid
grep: /etc/sudoers.d: Permission denied
/etc/rc3.d/S01nginx:# Try to extract nginx pidfile
/etc/rc3.d/S01nginx:    PID=/run/nginx.pid
/etc/rc5.d/S01nginx:# Try to extract nginx pidfile
/etc/rc5.d/S01nginx:    PID=/run/nginx.pid
/etc/systemd/system/multi-user.target.wants/nginx.service:PIDFile=/run/nginx.pid
/etc/systemd/system/multi-user.target.wants/nginx.service:ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
grep: /etc/shadow-: Permission denied
/etc/rc0.d/K01nginx:# Try to extract nginx pidfile
/etc/rc0.d/K01nginx:    PID=/run/nginx.pid

@MikeMcQ

I am not aware of such passenger behaviour; its role is related to the associated application (thus as sub-component of the nginx.conf file). However I am light years away from being an expert.

With the attached file though, the case should be replicable for capable hands.

LOL
Maybe better output with sudo

The three most important ones agree on the same location:

/etc/nginx/nginx.conf:pid /run/nginx.pid;
/etc/systemd/system/multi-user.target.wants/nginx.service:PIDFile=/run/nginx.pid
/etc/systemd/system/multi-user.target.wants/nginx.service:ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
1 Like

Yes, and under sudo, all the others agree

/etc/rc2.d/S01nginx:# Try to extract nginx pidfile
/etc/rc2.d/S01nginx:    PID=/run/nginx.pid
/etc/rc4.d/S01nginx:# Try to extract nginx pidfile
/etc/rc4.d/S01nginx:    PID=/run/nginx.pid
/etc/init.d/nginx:# Try to extract nginx pidfile
/etc/init.d/nginx:      PID=/run/nginx.pid
/etc/rc1.d/K01nginx:# Try to extract nginx pidfile
/etc/rc1.d/K01nginx:    PID=/run/nginx.pid
/etc/nginx/nginx.conf:pid /run/nginx.pid;
/etc/rc6.d/K01nginx:# Try to extract nginx pidfile
/etc/rc6.d/K01nginx:    PID=/run/nginx.pid
/etc/rc3.d/S01nginx:# Try to extract nginx pidfile
/etc/rc3.d/S01nginx:    PID=/run/nginx.pid
/etc/rc5.d/S01nginx:# Try to extract nginx pidfile
/etc/rc5.d/S01nginx:    PID=/run/nginx.pid
/etc/systemd/system/multi-user.target.wants/nginx.service:PIDFile=/run/nginx.pid
/etc/systemd/system/multi-user.target.wants/nginx.service:ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
/etc/rc0.d/K01nginx:# Try to extract nginx pidfile
/etc/rc0.d/K01nginx:    PID=/run/nginx.pid

The nginx segv fault is not easily reproduced. We would be seeing that a vast number of times per day if it was common.

Maybe look in /var/log/dmesg for clues? If you don't see something upload that and maybe we will see something helpful. (look for nginx and/or segfault)

1 Like

Those strings do not appear in that log. I enclose the contents nonetheless dmesg.txt (47.6 KB)

Note: with the process file provided I consistently generate that error. At least 6 VPS instances now.

Just to be clear to anyone trying to keep up with this topic...
Which versions of certbot and nginx are you using?

1 Like

$ nginx -v
nginx version: nginx/1.18.0 (Ubuntu)
$ certbot --version
certbot 1.20.0

OK now I'm curious - LOL

To round that off (so even I can put this in a lab):
Which version of Ubuntu?
Were both nginx and certbot installed from apt?
OR was certbot installed via snap?
[OR other... like either or both were compiled from source]

1 Like

Ubuntu 20.04.
nginx installed via apt
certbot installed via snap (freshly; I even ensured that sudo apt-get remove certbot ran beforehand - it drew a blank)

1 Like

Also see the process.txt from the earlier post #35 for other package details

1 Like

Do you the have Perl module enabled in nginx? i.e. Is /etc/nginx/modules-enabled/50-mod-http-perl.conf present?

nginx's master process segfaulting would explain some things. We've had numerous other reports of that module causing segfaults on reload, on Ubuntu servers.

Try disable it, if it's there.

If that doesn't help, it would be handy if you could gdb attach to the nginx master process before it crashes, and provide a backtrace of the segfault.

3 Likes

Best to have OP answer but this was in the letsencrypt log. Is that sufficient info?

2021-11-01 09:25:31,246:DEBUG:certbot.reverter:Creating backup of /etc/nginx/modules-enabled/50-mod-http-perl.conf

2 Likes

Yes, nicely spotted. Try removing that file @dvo, restarting nginx, then try the entire process again.

3 Likes

Yes, left the file in, but disabling its only line:
# load_module modules/ngx_http_perl_module.so;

# added new server name
$ sudo service nginx restart
~$
$ ls -l /run/nginx.pid
-rw-r--r-- 1 root root 5 Nov  2 06:11 /run/nginx.pid
sudo ps -eF | grep -E "nginx|PID"
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root        1137       1  0 26306  4668   0 06:11 ?        00:00:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data    1138    1137  0 26448  8828   0 06:11 ?        00:00:00 nginx: worker process
jerdvo      1159    1041  0  2039  2520   0 06:13 pts/0    00:00:00 grep --color=auto -E nginx|PID

$ sudo certbot --nginx -d testthree.fidely.club
#  [...]  Successfully received certificate. [...]  Deploying certificate
$ ls -l /run/nginx.pid
-rw-r--r-- 1 root root 5 Nov  2 06:17 /run/nginx.pid
$ sudo ps -eF | grep -E "nginx|PID"
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root        1419       1  0 26506 14428   0 06:17 ?        00:00:00 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
www-data    1746    1419  0 26566  9316   0 06:24 ?        00:00:00 nginx: worker process
jerdvo      1825    1041  0  2039  2516   0 06:25 pts/0    00:00:00 grep --color=auto -E nginx|PID

$ sudo service nginx restart
~$

huzzah

So the letsencrypt.log does indicate a backup, but of the disabled load_module command

2021-11-02 06:24:37,789:DEBUG:certbot.reverter:Creating backup of /etc/nginx/modules-enabled/50-mod-http-cache-purge.conf
2021-11-02 06:24:37,789:DEBUG:certbot.reverter:Creating backup of /etc/nginx/modules-enabled/50-mod-http-perl.conf
2021-11-02 06:24:37,789:DEBUG:certbot.reverter:Creating backup of /etc/nginx/modules-enabled/50-mod-http-xslt-filter.conf

Further changes to conf files pass nginx tests and the service restarts.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.