Auto Renewal: NGINX pid disappears, NGINX does't restart

Hi,

I have been experiencing problems with the certbot auto-renew-process since months.
First: Versions:

  • certbot 1.28.0
  • nginx/1.18.0 (Ubuntu)
  • python --version: Python 2.7.18
  • python3 --version: Python 3.8.10
  • Ubuntu 20.04.4 LTS

Problem Description:
Everytime a renew runs, it works for the first few domains. Then it fails for the ones after, because it cannot restart nginx. Checking the nginx logs, it fails to restart cause it cannot bind to the ports 80 and 443: This is because for some reason, nginx is still running and my services are also still available, but the process file got messed up. This also shows in the cerbot debug log:

2022-06-20 06:16:35,879:DEBUG:certbot._internal.display.obj:Notifying user: Reloading nginx server after certificate renewal
2022-06-20 06:16:35,950:DEBUG:certbot_nginx._internal.configurator:nginx reload failed:
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)

This was for the domain log.kultnow.com, but I experience similar behavior for meistersingerakademie.com or any of my domains.

My problem seems to be similar to Nginx installer not properly reloading configuration ยท Issue #7422 ยท certbot/certbot ยท GitHub, but not exactly the same. I can reproduce it really effectevly using certbot renew --dry-run.
Please look at the following log:

  1. nginx is running
  2. I run the above command, it first works, then I get the bind error
  3. nginx is not running anymore (but it actually is)
  4. That nginx is still running is shown by the killall command. If nginx was not running anymore, it would say something like 'No processes found...'.
  5. I restart nginx: Its running again.
root@UbuntuDroplet:~# systemctl status nginx
โ— nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-06-20 18:41:03 UTC; 42min ago
       Docs: man:nginx(8)
    Process: 1159555 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 1159557 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
   Main PID: 1159558 (nginx)
      Tasks: 3 (limit: 1066)
     Memory: 60.8M
     CGroup: /system.slice/nginx.service
             โ”œโ”€1159558 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
             โ”œโ”€1159559 nginx: worker process
             โ””โ”€1159560 nginx: cache manager process

Jun 20 18:41:02 UbuntuDroplet systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 20 18:41:03 UbuntuDroplet systemd[1]: Started A high performance web server and a reverse proxy server.
root@UbuntuDroplet:~# certbot renew --dry-run
Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/api.mindsupport.eu.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Simulating renewal of an existing certificate for api.mindsupport.eu

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/cms.meistersingerakademie.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Simulating renewal of an existing certificate for cms.meistersingerakademie.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/files.kaiser.fyi.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Simulating renewal of an existing certificate for files.kaiser.fyi

Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems:
  Domain: files.kaiser.fyi
  Type:   unauthorized
  Detail: 167.172.173.57: Invalid response from https://files.kaiser.fyi/.well-known/acme-challenge/Cq1JiUsjYVPzDQEI_JAYd-ZAi5KXy9ZiUe0vyPsdz-A: 404

Hint: The Certificate Authority failed to verify the temporary nginx configuration changes made by Certbot. Ensure the listed domains point to this nginx server and that it is accessible from the internet.

Encountered exception during recovery: certbot.errors.MisconfigurationError: nginx restart failed:
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] still could not bind()
Failed to renew certificate files.kaiser.fyi with error: Some challenges have failed.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/fonts.kaiser.fyi.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Simulating renewal of an existing certificate for fonts.kaiser.fyi
Encountered exception during recovery: certbot.errors.MisconfigurationError: nginx restart failed:
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] still could not bind()
Failed to renew certificate fonts.kaiser.fyi with error: nginx restart failed:
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:8888 failed (98: Address already in use)
nginx: [emerg] still could not bind()
^C^CExiting due to user request.
root@UbuntuDroplet:~# systemctl status nginx
โ— nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: failed (Result: core-dump) since Mon 2022-06-20 19:24:02 UTC; 58s ago
       Docs: man:nginx(8)
    Process: 1159555 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 1159557 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
   Main PID: 1159558 (code=dumped, signal=SEGV)
      Tasks: 0 (limit: 1066)
     Memory: 14.9M
     CGroup: /system.slice/nginx.service

Jun 20 18:41:02 UbuntuDroplet systemd[1]: Starting A high performance web server and a reverse proxy server...
Jun 20 18:41:03 UbuntuDroplet systemd[1]: Started A high performance web server and a reverse proxy server.
Jun 20 19:24:02 UbuntuDroplet systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
Jun 20 19:24:02 UbuntuDroplet systemd[1]: nginx.service: Killing process 1163927 (nginx) with signal SIGKILL.
Jun 20 19:24:02 UbuntuDroplet systemd[1]: nginx.service: Killing process 1163928 (nginx) with signal SIGKILL.
Jun 20 19:24:02 UbuntuDroplet systemd[1]: nginx.service: Killing process 1163927 (nginx) with signal SIGKILL.
Jun 20 19:24:02 UbuntuDroplet systemd[1]: nginx.service: Killing process 1163928 (nginx) with signal SIGKILL.
Jun 20 19:24:02 UbuntuDroplet systemd[1]: nginx.service: Failed with result 'core-dump'.
root@UbuntuDroplet:~# killall nginx
root@UbuntuDroplet:~# systemctl restart nginx

You can find a dump of all my configs here: (Using nginx -T.)
full_config.conf.txt (27.8 KB)

I'd be really thankful for any help!

Let me know, if you need more information!

Hi @KaiserRuben, and welcome to the LE community forum :slight_smile:

Have you tried restarting the entire server?

Have you tried using --webroot for authentication (instead of --nginx for authentication)?

1 Like

Hi @rg305,

Thanks for your reply. Of course I restarted my server, this sadly does not change anything.
I am unsure how I would change the authentication when running certbot renew [--dry-run], but I am happy to try.

1 Like

You can't change the authentication during a default scheduled renewal process.
You will have to repeat the initial cert request and indicate that it should use --webroot authentication.
See: User Guide โ€” Certbot 1.27.0 documentation (eff-certbot.readthedocs.io)

1 Like

I thought Certbot supported using the renew subcommand in combination with providing new options?

1 Like

Anything else is not that.

1 Like

@KaiserRuben Your problem sounds familiar. One earlier case was this thread. It is long so I will summarize here.

Do you by chance have perl enabled? If so, try disabling it, restart nginx and see if that allows the renew.

Why? A conflict with nginx can result using the nginx plug-in as after it makes the temp changes to your nginx conf it reloads it using SIGHUP. That's fine but if that fails it will start nginx but not using systemd. This creates an nginx that cannot be managed by systemd and the two nginx fight each other for ports leading to the symptom you saw.

Now, various things can cause the SIGHUP to fail. A common one is not having nginx running before doing the renew. Of course then the sighup will fail. You said nginx was running so likely not your cause.

I mention perl only because that explained the SEGV that the nginx sighup was failing with in the thread I linked to. We would have to dig through your system logs like we did in this linked thread. But, it would be a quick test if you had perl just to disable it.

A work-around is to use webroot as that avoids the nginx plug-in altogether. Webroot uses your running nginx as it is.

3 Likes

@MikeMcQ
Thanks a lot!! This seems to have worked (although I fail to see why exactly...).
I will continue testing over the next months and in case I face this problem again, reopen this.

1 Like