Nginx server fails after certbot renew

Hello,

Im having trouble with nginx service after running certbot renew:

sudo certbot renew --dry-run
Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/office.projectcloud.site.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert not due for renewal, but simulating renewal for dry run
Plugins selected: Authenticator nginx, Installer nginx
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for office.projectcloud.site
Waiting for verification...
Cleaning up challenges
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
new certificate deployed with reload of nginx server; fullchain is
/etc/letsencrypt/live/office.projectcloud.site/fullchain.pem
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates below have not been saved.)

Congratulations, all renewals succeeded. The following certs have been renewed:
  /etc/letsencrypt/live/office.projectcloud.site/fullchain.pem (success)
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates above have not been saved.)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

After that the result of nginx service is:

sudo systemctl status nginx
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset:>
     Active: failed (Result: core-dump) since Wed 2021-01-13 15:14:28 CET; 32s >
       Docs: man:nginx(8)
    Process: 689 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_proces>
    Process: 755 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (co>
   Main PID: 827 (code=dumped, signal=SEGV)
      Tasks: 0 (limit: 4654)
     Memory: 11.3M
     CGroup: /system.slice/nginx.service

Jan 13 15:12:25 onlyoffice-VirtualBox systemd[1]: Starting A high performance w>
Jan 13 15:12:26 onlyoffice-VirtualBox nginx[689]: nginx: [warn] "ssl_stapling" >
Jan 13 15:12:27 onlyoffice-VirtualBox nginx[755]: nginx: [warn] "ssl_stapling" >
Jan 13 15:12:27 onlyoffice-VirtualBox systemd[1]: Started A high performance we>
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Main process e>
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing proces>
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing proces>
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing proces>
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing proces>
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Failed with re>
lines 1-21/21 (END)...skipping...
● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: failed (Result: core-dump) since Wed 2021-01-13 15:14:28 CET; 32s ago
       Docs: man:nginx(8)
    Process: 689 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 755 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
   Main PID: 827 (code=dumped, signal=SEGV)
      Tasks: 0 (limit: 4654)
     Memory: 11.3M
     CGroup: /system.slice/nginx.service

Jan 13 15:12:25 onlyoffice-VirtualBox systemd[1]: Starting A high performance web server and a reverse proxy server...
Jan 13 15:12:26 onlyoffice-VirtualBox nginx[689]: nginx: [warn] "ssl_stapling" ignored, host not found in OCSP responder "r3.o.lencr.org" in the certificate "/etc/letsencrypt/live/office.projectcloud.site/ful>
Jan 13 15:12:27 onlyoffice-VirtualBox nginx[755]: nginx: [warn] "ssl_stapling" ignored, host not found in OCSP responder "r3.o.lencr.org" in the certificate "/etc/letsencrypt/live/office.projectcloud.site/ful>
Jan 13 15:12:27 onlyoffice-VirtualBox systemd[1]: Started A high performance web server and a reverse proxy server.
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing process 2117 (nginx) with signal SIGKILL.
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing process 2118 (nginx) with signal SIGKILL.
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing process 2117 (nginx) with signal SIGKILL.
Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Killing process 2118 (nginx) with signal SIGKILL.

Jan 13 15:14:28 onlyoffice-VirtualBox systemd[1]: nginx.service: Failed with result 'core-dump'.

Pls can someone help me to troubleshoot these errors. Im using Ubuntu 20.04, I have configured OnlyOffice Document Server. I cant start the server anymore, I have to restart the machine to start nginx server. So my problem begin after checking for certbot renew.

Thank you, I would appreciate any help.

1 Like

A segmentation fault is very unusual to see. It means that there is a major bug in nginx or that something weird is happening with the environment.

Are you doing anything like serving files or configuring nginx through a virtualbox shared folder?

Unfortunately this might be a bit tricky to debug. If there's nothing in /var/log/nginx/error.log at the time of the crash, you might need to increase nginx's logging verbosity to try catch it next time.

2 Likes

No im not using shared folders.

/var/log/nginx/error.log

2021/01/14 00:11:39 [info] 26776#26776: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:63
2021/01/14 00:11:43 [notice] 26781#26781: signal process started
Out of memory!
2021/01/14 00:11:43 [alert] 850#850: perl_parse() failed: 1
2021/01/14 00:11:46 [notice] 26782#26782: signal process started
2021/01/14 00:17:06 [warn] 695#695: "ssl_stapling" ignored, host not found in OCSP responder "r3.o.lencr.org" in the certificate "/etc/letsencrypt/live/office.projectc>
2021/01/14 00:17:06 [info] 695#695: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:63
2021/01/14 00:17:07 [warn] 782#782: "ssl_stapling" ignored, host not found in OCSP responder "r3.o.lencr.org" in the certificate "/etc/letsencrypt/live/office.projectc>
2021/01/14 00:20:23 [info] 2172#2172: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:63
2021/01/14 00:20:26 [notice] 2175#2175: signal process started
2021/01/14 00:20:38 [notice] 2211#2211: signal process started
2021/01/14 00:20:41 [notice] 2213#2213: signal process started
2021/01/14 00:20:41 [error] 2213#2213: open() "/run/nginx.pid" failed (2: No such file or directory)

That could do it.

How much memory is assigned to this VM and how much is on the host?

Only thing is, the timestamps of that event and the segfault don't match up :frowning: .

I dont understand why nginx crashes only after simulating the command "certbot renew --dry-run"

More errors like:
onlyoffice@onlyoffice-VirtualBox:~$ journalctl -xe
Jan 14 00:27:28 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Jan 14 00:27:28 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Jan 14 00:27:28 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Jan 14 00:27:28 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Jan 14 00:27:28 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Jan 14 00:27:28 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Jan 14 00:27:29 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Jan 14 00:27:30 onlyoffice-VirtualBox nginx[2633]: nginx: [emerg] still could not bind()
Jan 14 00:27:30 onlyoffice-VirtualBox systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- An ExecStart= process belonging to unit nginx.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Jan 14 00:27:30 onlyoffice-VirtualBox systemd[1]: nginx.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit nginx.service has entered the 'failed' state with result 'exit-code'.
Jan 14 00:27:30 onlyoffice-VirtualBox systemd[1]: Failed to start A high performance web server and a reverse proxy server.
-- Subject: A start job for unit nginx.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit nginx.service has finished with a failure.
--
-- The job identifier is 1300 and the job result is failed.
Jan 14 00:27:30 onlyoffice-VirtualBox sudo[2629]: pam_unix(sudo:session): session closed for user root
lines 2446-2483/2483 (END)

Memory of VM was 4 GB, I changed to 8 GB but its the same problem

OK, let's put the segfault to the side for now.

Certbot uses this command to restart/reload nginx:

nginx -c /etc/nginx/nginx.conf -s reload

If that fails (i.e. because /run/nginx.pid doesn't exist), Certbot assumes nginx is not running, and tries to start it using:

nginx -c /etc/nginx/nginx.conf
  1. Is that the right path to the nginx config file on your system?

  2. Are you able to reload nginx using that first command, without involving Certbot at all?

1 Like

Thank you for helping :slight_smile:

nginx: [alert] could not open error log file: open() "/var/log/nginx/error.log" failed (13: Permission denied)
2021/01/14 08:52:19 [warn] 27374#27374: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:1
2021/01/14 08:52:19 [emerg] 27374#27374: cannot load certificate "/etc/letsencrypt/live/office.projectcloud.site/fullchain.pem": BIO_new_file() failed (SSL: error:0200100D:system library:fopen:Permission denied:fopen('/etc/letsencrypt/live/office.projectcloud.site/fullchain.pem','r') error:2006D002:BIO routines:BIO_new_file:system lib)

yes it is the right path:

    user www-data;
    worker_processes auto;
    pid /run/nginx.pid;
    include /etc/nginx/modules-enabled/*.conf;

    events {
            worker_connections 768;
            # multi_accept on;
    }

    http {

            ##
            # Basic Settings
            ##

            sendfile on;
            tcp_nopush on;
            tcp_nodelay on;
            keepalive_timeout 65;
            types_hash_max_size 2048;
            # server_tokens off;

            # server_names_hash_bucket_size 64;
            # server_name_in_redirect off;

            include /etc/nginx/mime.types;
            default_type application/octet-stream;

            ##
            # SSL Settings
            ##

            ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # Dropping SSLv3, ref: POODLE
            ssl_prefer_server_ciphers on;

            ##
            # Logging Settings
            ##

            access_log /var/log/nginx/access.log;
            error_log /var/log/nginx/error.log;

            gzip on;

            include /etc/nginx/conf.d/*.conf;
            include /etc/nginx/sites-enabled/*;
    }

Also i am sharing /etc/nginx/conf.d/ds.conf file:

include /etc/nginx/includes/http-common.conf;
server {
  server_tokens off;
  server_name office.projectcloud.site;

  include /etc/nginx/includes/ds-*.conf;

    listen [::]:443 ssl ipv6only=on; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/office.projectcloud.site/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/office.projectcloud.site/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot


    add_header Strict-Transport-Security "max-age=31536000" always; # managed by Certbot


    ssl_trusted_certificate /etc/letsencrypt/live/office.projectcloud.site/chain.pem; # managed by Certbot
    ssl_stapling on; # managed by Certbot
    ssl_stapling_verify on; # managed by Certbot

}

server {
    if ($host = office.projectcloud.site) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


  listen 0.0.0.0:80;
  listen [::]:80 default_server;
  server_tokens off;
  server_name office.projectcloud.site;

  include /etc/nginx/includes/ds-*.conf;

}

You'll need to run it as the root user, or with sudo.

nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)

And nginx is definitely already running when you run that command?

That's weird. That file should exist, because nginx should create it, thanks to this part of your configuration:

This would explain why Certbot is having trouble. Certbot relies on that command (nginx -s reload) to succeed.

However, I'm not sure why the pid file wouldn't be getting created.

If you kill everything and restart nginx, does the file appear? (Might need to sudo apt install psmisc for killall).

sudo systemctl stop nginx
sudo killall -9 nginx
sudo systemctl start nginx
sudo ls -lah /run/nginx.pid

Yes now the server is active, it works fine

I also run the command: sudo nginx -c /etc/nginx/nginx.conf -s reload

and the nxinx server didnt failed

Actually now im having the same issue core=dumped with status=11/SEGV

So my problem is not a certificate problem but with nginx configuration,

I have not installed nginx manually but it was installed with OnlyOffice document server.

I tried installing onlyoffice using these instructions: https://helpcenter.onlyoffice.com/installation/docs-community-install-ubuntu.aspx

I get the same issue with nginx -s reload. (Without using Certbot at all).

Basically, the nginx master process sometimes crashes during configuration reload. In a debugger, this is the backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x00007fdbadbfd593 in Perl__invlist_intersection_maybe_complement_2nd ()
  from target:/lib/x86_64-linux-gnu/libperl.so.5.30
(gdb) bt
#0  0x00007fdbadbfd593 in Perl__invlist_intersection_maybe_complement_2nd ()
  from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#1  0x00007fdbadbfdbd5 in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#2  0x00007fdbadc0dc7f in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#3  0x00007fdbadc14268 in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#4  0x00007fdbadc18d03 in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#5  0x00007fdbadc192cf in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#6  0x00007fdbadc1e36c in Perl_re_op_compile () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#7  0x00007fdbadbb2c35 in Perl_pmruntime () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#8  0x00007fdbadbef23b in Perl_yyparse () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#9  0x00007fdbadc907a7 in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#10 0x00007fdbadc9606d in Perl_pp_require () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
--Type <RET> for more, q to quit, c to continue without paging--
#11 0x00007fdbadc4b4a6 in Perl_runops_standard () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#12 0x00007fdbadbb8d24 in Perl_call_sv () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#13 0x00007fdbadbbb950 in Perl_call_list () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#14 0x00007fdbadb98e00 in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#15 0x00007fdbadbb19ff in Perl_newATTRSUB_x () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#16 0x00007fdbadbb4f92 in Perl_utilize () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#17 0x00007fdbadbef6d9 in Perl_yyparse () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#18 0x00007fdbadc907a7 in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#19 0x00007fdbadc9606d in Perl_pp_require () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#20 0x00007fdbadc4b4a6 in Perl_runops_standard () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#21 0x00007fdbadbb8d24 in Perl_call_sv () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#22 0x00007fdbadbbb950 in Perl_call_list () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#23 0x00007fdbadb98e00 in ?? () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#24 0x00007fdbadbb19ff in Perl_newATTRSUB_x () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#25 0x00007fdbadbb4f92 in Perl_utilize () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#26 0x00007fdbadbef6d9 in Perl_yyparse () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#27 0x00007fdbadbbfa72 in perl_parse () from target:/lib/x86_64-linux-gnu/libperl.so.5.30
#28 0x00007fdbadea5b4d in ?? () from target:/usr/share/nginx/modules/ngx_http_perl_module.so
#29 0x00007fdbadea63e9 in ?? () from target:/usr/share/nginx/modules/ngx_http_perl_module.so
#30 0x000055f805ef8659 in ?? ()
#31 0x000055f805ed2d75 in ngx_conf_parse ()
#32 0x000055f805ed02ec in ngx_init_cycle ()
#33 0x000055f805eea4b2 in ngx_master_process_cycle ()
#34 0x000055f805ebc4ca in main ()

Perl? Looks familiar. In your earlier error log, we saw:

Looking around the nginx configuration, I don't think that onlyoffice actually needs the nginx Perl module. As far as I can tell, it's only present because it's enabled by default as part of nginx-extras package.

You can disable the module:

sudo rm /etc/nginx/modules-enabled/50-mod-http-perl.conf

and make sure to restart nginx fully again, using the steps from earlier:

sudo systemctl stop nginx
sudo killall -9 nginx
sudo systemctl start nginx

After disabling the Perl module, I can't get nginx to crash anymore. I was able to create a certificate using certbot --nginx and nginx -s reload works every time too.

I'm not sure what the root cause of the Perl crash is. I didn't look too deeply at it. It might have something to do with the fact that the onlyoffice packages weren't built for Ubuntu 20.04 (for Debian Squeeze in fact) and there might be some binary incompatibility happening somewhere.

2 Likes

Yess, you are a lifesaver :slight_smile:

Thank you so much for your help and solution.

1 Like