Bug: installing with certbot impedes further nginx conf changes without reboot

sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

sudo certbot --nginx -d test.fidely.club

choose option 2 for auto-redirect, runs as expected, cert is issued and validated via ssllabs.com

Make one change to .conf file for the application under /etc/nginx/sites-enabled/...

$ sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

$ sudo service nginx restart
Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.

$ systemctl status nginx.service
â—Ź nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2021-10-29 06:47:26 UTC; 7min ago
       Docs: man:nginx(8)
    Process: 18175 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 18176 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=1/FAILURE)

This occurs constantly when editing conf files across 3 servers under Ubuntu 20.04. Now recently tested on a fresh installation with identical results.

sudo reboot is necessary to make changes effective to nginx.

Exactly what change was made?
[I don't see how a reboot can "work" when a restart fails]

So after reboot, these work?:

1 Like

Change: Removal of a blank line.

Yes, after re-boot those work. Consistently.

Thus certbot touches something improperly, as this error only appears - but consistently - when certbot is used to with certbot --nginx -d [...]

Can you show the "before" and "after"?
[upload the before file - this site may change the formatting]

1 Like
server {

  server_name test.fidely.club;
  root /home/deploy/fidelity/current/public;

  passenger_enabled on;
  passenger_app_env development;

  location /cable {
    passenger_app_group_name myapp_websocket;
    passenger_force_max_concurrent_requests_per_process 0;
  }

  # Allow uploads up to 100MB in size
  client_max_body_size 100m;

  location ~ ^/(assets|packs) {
    expires max;
    gzip_static on;
  }

  location ~ /(wp-content|solr|jsonws|mifs|wp-includes|login.asp)/ {
      return 404;
  }

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/test.fidely.club/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/test.fidely.club/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}
server {
    if ($host = test.fidely.club) {
        return 301 https://$host$request_uri;
    } # managed by Certbot



  server_name test.fidely.club;
    listen 80;
    return 404; # managed by Certbot


}

In practice, there were two blank lines before listen 443 ssl; . One was removed.

Visually seeing the file (like from a picture) doesn't show why it fails nginx -t.
Please upload (click image) the HTTP file that certbot used to make this secured file from.

1 Like

Also, this might show more relevant detail:

1 Like

journalctl -xe

Oct 29 06:20:54 fidely systemd[17050]: Reached target Paths.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 15.
Oct 29 06:20:54 fidely systemd[17050]: Reached target Timers.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 3.
Oct 29 06:20:54 fidely systemd[17050]: Starting D-Bus User Message Bus Socket.
-- Subject: A start job for unit UNIT has begun execution
-- Defined-By: systemd
--
-- A start job for unit UNIT has begun execution.
--
-- The job identifier is 6.
Oct 29 06:20:54 fidely systemd[17050]: Listening on GnuPG network certificate management daemon.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 10.
Oct 29 06:20:54 fidely systemd[17050]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 14.
Oct 29 06:20:54 fidely systemd[17050]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 11.
Oct 29 06:20:54 fidely systemd[17050]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 13.
Oct 29 06:20:54 fidely systemd[17050]: Listening on GnuPG cryptographic agent and passphrase cache.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 9.
Oct 29 06:20:54 fidely systemd[17050]: Listening on debconf communication socket.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 12.
Oct 29 06:20:54 fidely systemd[17050]: Listening on REST API socket for snapd user session agent.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 8.
Oct 29 06:20:54 fidely systemd[17050]: Listening on D-Bus User Message Bus Socket.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 6.
Oct 29 06:20:54 fidely systemd[17050]: Reached target Sockets.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 5.
Oct 29 06:20:54 fidely systemd[17050]: Reached target Basic System.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 2.
Oct 29 06:20:54 fidely systemd[17050]: Reached target Main User Target.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
--
-- A start job for unit UNIT has finished successfully.
--
-- The job identifier is 1.
Oct 29 06:20:54 fidely systemd[17050]: Startup finished in 101ms.
-- Subject: User manager start-up is now complete
-- Defined-By: systemd
--
-- The user manager instance for user 1001 has been started. All services queued
-- for starting have been started. Note that other services might still be starting
-- up or be started at any later time.
--
-- Startup of the manager took 101747 microseconds.
-- Startup of the manager took 101747 microseconds.
Oct 29 06:20:55 fidely sshd[17154]: Received disconnect from X.X.X.X port 8021:11: disconnected by user
Oct 29 06:20:55 fidely sshd[17154]: Disconnected from user deploy X.X.X.X
 port 8021
Oct 29 06:21:24 fidely sudo[17223]: pam_unix(sudo:auth): authentication failure; logname=deploy uid=1001 euid=0 tty=/dev/pts/0 ruser=deploy rhost=  u>

test.txt (1.1 KB)

@dvo Just guessing ... Could it have anything to do with the passenger config? I do not know much about it but reading its docs I see this:

passenger_root path;

Refers to the location to the Passenger root directory, or to a location configuration file. This configuration option is essential to Passenger, and allows Passenger to locate its own data files.

But, I do not see this "essential" setting in your config. The nginx problem you describe is unusual so I am just looking for something unusual in your conf.

Could you try adding that setting or even removing passenger from the nginx conf just as a test to see if it is related to that?

https://www.phusionpassenger.com/library/config/nginx/reference/#application-loading

2 Likes

I will attempt to remove the reference to passenger, but I am quite certain that is not the source.

Why then would

sudo reboot
[...]
sudo nginx -t
sudo service nginx restart

then allow nginx to restart?

The only change that generated the error (and I tested this in isolation) is the invocation of sudo certbot --nginx -d [...]

As noted, I am just looking for something unusual. nginx does not usually behave this way after certbot --nginx.

As background, certbot will issue a nginx -s reload after updating the config. This sends a signal to nginx to reload the conf. This will not be shown in the nginx.service status except for the new PID for the worker process. Note if you do sudo systemctl reload nginx those do appear in the nginx service status (and the new pid of course).

It seems to me that the nginx state is getting "off" and only apparent the second time a reload / restart is done. Another test is to try several sudo nginx -s reload without using certbot just modifying the conf slightly each time as you did.

My guess is something about the passenger integration / install is causing it. Especially when I saw a key config item missing from it.

3 Likes

It's not really possible for nginx to require a reboot for changes to take effect. The issue is most likely due to a bug in the process controller script or process controller itself.

Try issuing a kill -HUP {nginx "master" process id}, which is how nginx does a graceful restart (the main process rereads configuration files, after handling their own active requests, each child process will respawn). That should work, and would indicate to me the issue is with systemctl.

[I apologize to any offended by "master". I personally avoid that term in place of "main" or "primary", but nginx has not yet updated it's terminology to more inclusive words.]

1 Like

You seem to be onto something. I attempted a second certificate on the same end point & was going to try the suggestion.

While the certificate got generated, a new error did arise. I find it preferable to communicate this before attempting the suggestion as it might provide better insight.

Rolling back to previous server configuration...
nginx: [alert] kill(1547, 1) failed (3: No such process)
Encountered exception during recovery:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/certbot/client.py", line 529, in deploy_certificate
    self.installer.restart()
  File "/usr/lib/python3/dist-packages/certbot_nginx/configurator.py", line 919, in restart
    nginx_restart(self.conf('ctl'), self.nginx_conf)
  File "/usr/lib/python3/dist-packages/certbot_nginx/configurator.py", line 1202, in nginx_restart
    raise errors.MisconfigurationError(
certbot.errors.MisconfigurationError: nginx restart failed:
b''
b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/certbot/error_handler.py", line 124, in _call_registered
    self.funcs[-1]()
  File "/usr/lib/python3/dist-packages/certbot/client.py", line 634, in _rollback_and_restart
    self.installer.restart()
  File "/usr/lib/python3/dist-packages/certbot_nginx/configurator.py", line 919, in restart
    nginx_restart(self.conf('ctl'), self.nginx_conf)
  File "/usr/lib/python3/dist-packages/certbot_nginx/configurator.py", line 1202, in nginx_restart
    raise errors.MisconfigurationError(
certbot.errors.MisconfigurationError: nginx restart failed:
b''
b''
nginx restart failed:
b''
b''
IMPORTANT NOTES:
 - An error occurred and we failed to restore your config and restart
   your server. Please post to
   https://community.letsencrypt.org/c/help with details about your
   configuration and this error you received.
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/testtwo.fidely.club/fullchain.pem
[...]

Note:

sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
1 Like

That's interesting. Need to keep gathering facts :slight_smile: I would do a sudo systemctl status nginx.service in between each command to watch for messages and keep track of the pids that are active (in the CGroup). Or, a briefer display of pids using something like: ps -eF | grep -E "nginx|PID"

A quick search through this forum found a similar "no such process" error although no remedy was found. Maybe further clues though? Could your pid folder be damaged or odd in some way?

Update: @dvo Oh, forgot to include link to that other thread I mentioned.

2 Likes

The journal log seems like it no longer contains the nginx problem.
You'd have to run that soon after the problem returns.

As for the test.txt file - that is the HTTPS config.
I was looking for the HTTP config.

1 Like

Also, I can't seem to find it, what version of certbot are you running?

1 Like

'folder damaged or odd in some way'. I somehow doubt it as my work flow is always the same from one VPS to another. But under Ubuntu 20.04, they all exhibit this same behaviour.

I will try spinning a new one up tomorrow and document the status between each step.

I suspect you are using certbot 1.9.0 (or lower) and this can be fixed by upgrading certbot
Show:
certbot --version

2 Likes

certbot 0.40.0
I only have that config for as http is then being forwarded to https

lower than or equal to 1.9.0?
but why would sudo apt install certbot python3-certbot-nginx install by default (yesterday!) what appears to be a much lower version?

That config contains:

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/test.fidely.club/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/test.fidely.club/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

So I would dare say it is NOT the HTTP config file requested.

As for:

Ubuntu supports snap since... forever.
You should follow the recommended installation guide:
Certbot - Ubuntufocal Nginx (eff.org)

2 Likes