Cert renews not working anymore

My domain is: In this case nadybot.org

I ran this command: certbot renew --cert-name nadybot.org --no-random-sleep-on-renew --dry-run -v

It produced this output:

[root@anarchy-online conf.d]# /usr/bin/certbot renew --cert-name nadybot.org --no-random-sleep-on-renew --dry-run -v --nginx
Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/nadybot.org.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Certificate is due for renewal, auto-renewing...
Plugins selected: Authenticator nginx, Installer nginx
Simulating renewal of an existing certificate for nadybot.org and www.nadybot.org
Performing the following challenges:
http-01 challenge for nadybot.org
http-01 challenge for www.nadybot.org
Waiting for verification...
Challenge failed for domain nadybot.org
Challenge failed for domain www.nadybot.org
http-01 challenge for nadybot.org
http-01 challenge for www.nadybot.org

Certbot failed to authenticate some domains (authenticator: nginx). The Certificate Authority reported these problems:
  Domain: nadybot.org
  Type:   unauthorized
  Detail: 138.201.187.13: Invalid response from https://nadybot.org/.well-known/acme-challenge/5e1xO09-lSH22HTwe1vom9lEe5qnmVJ1DT2HM_pUQbg: 404

  Domain: www.nadybot.org
  Type:   unauthorized
  Detail: 138.201.187.13: Invalid response from https://www.nadybot.org/.well-known/acme-challenge/KZ08iSssro3lp42SOQ6qUnK_womxsEPQgdULvlLGphE: 404

Hint: The Certificate Authority failed to verify the temporary nginx configuration changes made by Certbot. Ensure the listed domains point to this nginx server and that it is accessible from the internet.

Cleaning up challenges
Failed to renew certificate nadybot.org with error: Some challenges have failed.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
All simulated renewals failed. The following certificates could not be renewed:
  /etc/letsencrypt/live/nadybot.org/fullchain.pem (failure)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 renew failure(s), 0 parse failure(s)

My web server is (include version): Nginx 1.28.0

The operating system my web server runs on is (include version): Fedora 42

My hosting provider, if applicable, is: -

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 3.3.0

Details:

All my certs can't be renewed anymore, no idea when exactly it started. Letme show you the condif of the domain I posted:

server {
    server_name  nadybot.org www.nadybot.org;
    location / {
    	root /usr/share/nginx/html/;
    }


    listen 443 ssl; # managed by Certbot
    http2 on;
    ssl_certificate /etc/letsencrypt/live/nadybot.org/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/nadybot.org/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

server {
    if ($host = www.nadybot.org) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    if ($host = nadybot.org) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    server_name  nadybot.org www.nadybot.org *.nadybot.org;
    listen 80;
}

The problem seems to be that certbot's rewrite of the Nginx config looks like it's the wrong order:

2026-04-19 13:06:18,888:DEBUG:certbot_nginx._internal.parser:Writing nginx conf tree to /etc/nginx/conf.d/nadybot.org.conf:
server {rewrite ^(/.well-known/acme-challenge/.*) $1 break; # managed by Certbot

rewrite ^(/.well-known/acme-challenge/.*) $1 break; # managed by Certbot


    server_name  nadybot.org www.nadybot.org;
    location / {
        root /usr/share/nginx/html/;
    }


    listen 443 ssl; # managed by Certbot
    http2 on;
    ssl_certificate /etc/letsencrypt/live/nadybot.org/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/nadybot.org/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
location = /.well-known/acme-challenge/long-string{default_type text/plain;return 200 secret-string;} # managed by Certbot

location = /.well-known/acme-challenge/another-long-string{default_type text/plain;return 200 secret-string;} # managed by Certbot

}

server {rewrite ^(/.well-known/acme-challenge/.*) $1 break; # managed by Certbot

rewrite ^(/.well-known/acme-challenge/.*) $1 break; # managed by Certbot


    if ($host = www.nadybot.org) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    if ($host = nadybot.org) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    server_name  nadybot.org www.nadybot.org *.nadybot.org;
    listen 80;
location = /.well-known/acme-challenge/long-string{default_type text/plain;return 200 secret-string;} # managed by Certbot

location = /.well-known/acme-challenge/another-long-string{default_type text/plain;return 200 secret-string;} # managed by Certbot

}

I don't know why it now fails where it didn't fail before, but is there a way to fix this? I get the feeling that there must be some easy solution to this…

Welcome @Nadyita

It looks like something changed since your last good cert issued Dec25 last year which expired Mar25. Certbot would have started renewal requests around Feb25 so has been failing for some time.

Is this a large nginx install? Has it gotten larger, or perhaps nginx reloads gotten slower, in recent months?

Because my first guess is a timing issue. What does this do?

The installation has 34 (sub)domains, so not really that many. And since December, I haven't really touched it, maybe added 1 or 2 (sub)domains. The 5s sleep still leads to a 404.

Are you sure that the order of location / before location = /.well-known/acme-challenge… should really work? According to the nginx error log, it's looking for /usr/share/nginx/html/.well-known/acme-challenge…

Yes, Certbot has always placed the "location =" block at the bottom of the server block. An "=" sign location takes priority.

Then that isn't the active server block handling the incoming request. Or, there is some other issue when Certbot reloads your nginx.

Things to try:

Add a unique access_log and error_log in that server block. Then see what, if anything, shows up in those for the --dry-run request. If no access log entry then some other server block handled it. Perhaps you use an IP address on a listen statement somewhere which takes precedence? Or do you have a second nginx system running that the request is routed to?

Check that the nginx reload is working properly. After Certbot makes the change to your nginx config it issues a reload (asynchronously). It waits the nginx-sleep-seconds period and then requests the cert from Let's Encrypt. If the reload fails your server block file will have the new lines but the active worker config will not. If the reload exceeds the sleep-seconds nginx just won't be ready with the new config before LE makes its HTTP request to you (which is why I suggested a longer sleep).

Try these to check if reload is working:

sudo ps -eo pid,ppid,start,args | grep nginx | grep -v grep
sudo nginx -s reload
sleep 1
sudo ps -eo pid,ppid,start,args | grep nginx | grep -v grep

Make sure you get all new worker processes after the reload. The master process pid will stay the same. Problems with reload can be caused by lack of handles, for example. Your system doesn't seem large enough for that but possibly if these are very limited on your system.

After this command sequence also check the nginx error_log for reload related issues

Okay, I added logging and noticed no matter where, it would not log. Then I restarted (not reloaded) nginx and got some weird error messages that did not make any sense in that content. And finally, I rebooted the server, because Nginx failed to start due to these errors.

After the reboot, certificate renewal worked as expected. So I guess, it was some kind of "hiccup". Sorry for the hassle.

No worries. Those kinds of nginx problems can be hard to sort out.

If you want to post them maybe I'll recognize them from prior similar issues. Otherwise something like serverfault or some other nginx specialty group might help sort that.

Finding out the underlying cause of those odd messages will likely prevent it from happening again.

Bummer, I didn't log them, and they are not in the error log. As I said, they did not make any sense in that context. Something about wrong GLIBC version :man_shrugging: