Certbot fails to renew certificate using nginx plugin

scr4bble · October 2, 2020, 10:29am

Hi guys, my certbot behaves very strangely. It is not able to renew certificate in 95% of cases. Sometimes it is successful, but in most cases it fails (without changing any configuration, just two subsequent runs of the command - one fails and one succeeds - I have logs of both such runs).
Any idea what it may be caused by? It was working for months.
Help highly appreciated.

My domain is: api.bustravel.is

I ran this command: certbot renew --cert-name api.bustravel.paxflow.io --dry-run

It produced this output:

Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/api.bustravel.paxflow.io.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert is due for renewal, auto-renewing...
Plugins selected: Authenticator nginx, Installer nginx
Starting new HTTPS connection (1): acme-staging-v02.api.letsencrypt.org
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for api.bustravel.is
Waiting for verification...
Challenge failed for domain api.bustravel.is
http-01 challenge for api.bustravel.is
Cleaning up challenges
Attempting to renew cert (api.bustravel.paxflow.io) from /etc/letsencrypt/renewal/api.bustravel.paxflow.io.conf produced an unexpected error: Some challenges have failed.. Skipping.
All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/api.bustravel.paxflow.io/fullchain.pem (failure)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates below have not been saved.)

All renewal attempts failed. The following certs could not be renewed:
  /etc/letsencrypt/live/api.bustravel.paxflow.io/fullchain.pem (failure)
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates above have not been saved.)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 renew failure(s), 0 parse failure(s)

IMPORTANT NOTES:
 - The following errors were reported by the server:

   Domain: api.bustravel.is
   Type:   unauthorized
   Detail: Invalid response from
   http://api.bustravel.is/.well-known/acme-challenge/DIPIp7zfacU_xL6wwzkd17QS_bb1VCEtyj4Rn4upc-U
   [2a01:4f8:221:205a::2]: "<html>\r\n<head><title>404 Not
   Found</title></head>\r\n<body>\r\n<center><h1>404 Not
   Found</h1></center>\r\n<hr><center>nginx</center>\r\n"

   To fix these errors, please make sure that your domain name was
   entered correctly and the DNS A/AAAA record(s) for that domain
   contain(s) the right IP address.

My web server is (include version): nginx version: nginx/1.16.1

The operating system my web server runs on is (include version):
CentOS Linux release 7.8.2003 (Core

My hosting provider, if applicable, is:
Hetzner

I can login to a root shell on my machine (yes or no, or I don't know):
yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
no, just bare console

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
certbot 1.7.0

$ cat /etc/letsencrypt/renewal/api.bustravel.paxflow.io.conf

# renew_before_expiry = 30 days
version = 1.0.0
archive_dir = /etc/letsencrypt/archive/api.bustravel.paxflow.io
cert = /etc/letsencrypt/live/api.bustravel.paxflow.io/cert.pem
privkey = /etc/letsencrypt/live/api.bustravel.paxflow.io/privkey.pem
chain = /etc/letsencrypt/live/api.bustravel.paxflow.io/chain.pem
fullchain = /etc/letsencrypt/live/api.bustravel.paxflow.io/fullchain.pem

# Options used in the renewal process
[renewalparams]
authenticator = nginx
installer = nginx
account = f6fb3dcb6db3eb975a0128963a92c3a4
server = https://acme-v02.api.letsencrypt.org/directory
renew_hook = nginx -t 2>&1 && systemctl reload nginx

Osiris · October 2, 2020, 10:59am

Could be due to some nginx configuration which certbot doesn't understand properly. I'm not seeing a difference between IPv4 or IPv6, so that's probably not it..

Could you paste the output of nginx -T? You can edit out /etc/nginx/mime.types as that probably won't be relevant.

scr4bble · October 2, 2020, 12:05pm

@Osiris thanks for your reply. Unfortunately I cannot paste whole nginx configuration here as it contains production virtualhosts and I don't find it secure to share publicly. I can paste the relevant parts though.
Just let me know which they are if I miss anything.

redirecting HTTP to HTTPS

# redirecting HTTP to HTTPS
server {
       listen 80 default_server;
       listen [::]:80 default_server;

       #server_name paxflow.is *.paxflow.is;

   #include snippets/cbs-location-restrictions.conf;

   #location / {
   #    return 301 https://$host$request_uri;
   #}
}

configuration file /etc/nginx/sites-enabled/api.bustravel.paxflow.io:

server {
        listen 443 ssl;
        listen [::]:443 ssl;

        server_name api.bustravel.is;

        include snippets/paxflow.io/bustravel/ssl-api.conf;
        include snippets/ssl-params.conf;

        access_log /var/log/nginx/bustravel/api.bustravel.paxflow.io/access.log;
        error_log /var/log/nginx/bustravel/api.bustravel.paxflow.io/error.log;

        root /srv/www/bustravel/api.bustravel.paxflow.io/public;

        index index.php;

        location / {
                if (-f $request_filename) {
                        break;
                }
                rewrite ^/([^/]+)/([^/]+)/$ /index.php?module=$1&action=$2&$args? last;
                rewrite ^/([^/]+)/$ /index.php?module=$1&$args? last;
        }

        location ~ \.php$ {
                include /etc/nginx/fastcgi_params;
                fastcgi_pass  unix:/var/run/php-fpm/php-fpm.bustravel.sock;
                fastcgi_index index.php;
                fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        }
}

server {
        listen 443 ssl;
        listen [::]:443 ssl;

        server_name api.bustravel.paxflow.io;

        include snippets/paxflow.io/bustravel/ssl.conf;
        include snippets/ssl-params.conf;

        return 301 https://api.bustravel.is$request_uri;
}

When taking a look into letsencrypt log, this is what certbot appended in nginx.conf during the challenge.

server {rewrite ^(/.well-known/acme-challenge/.*) $1 break; # managed by Certbot


       listen 80 ;
       listen [::]:80 ;

       #server_name paxflow.is *.paxflow.is;

   #include snippets/cbs-location-restrictions.conf;

   #location / {
   #    return 301 https://$host$request_uri;
   #}

server_name api.bustravel.is; # managed by Certbot
location = /.well-known/acme-challenge/k4czdimvkDg8cHZVL2zFF8n_KIheuhpEgHFGaDDrB6E{default_type text/plain;return 200 k4czdimvkDg8cHZVL2zFF8n_KIheuhpEgHFGaDDrB6E.JTUTFJLaYXwZTd2OS-y1CvfDsJWzwq-yWUrunS2zUSg;} # managed by Certbot

}

The commented parts are the ones that are usually working and I commented them out now when trying to make it work. Nothing helped though.
And the few successful challenges were not redirected to https despite "return 301..." line being present in the default server block for port 80.

Osiris · October 2, 2020, 12:28pm

And those files are the only references to api.bustravel.is? Because I don't see any reason why the server block added by certbot wouldn't be triggered..

scr4bble · October 2, 2020, 12:41pm

Yes, those are the only mentions in whole nginx configuration for that domain.
I don't see a reason either. I feels like the nginx sometimes serves the challenge file and sometimes redirects to HTTPS and ends up with 404. Could it be somehow affected by HSTS? I just managed to run it two times with exactly the same configuration and stored logs from both runs. I am just not sure if it's safe to share them here publicly (the whole letsencrypt.log files).

Osiris · October 2, 2020, 12:54pm

As far as I know, the Let's Encrypt validation server ignores HSTS headers.

The log file only contains public keys (for the ACME connection), no private keys are stored in it. You can however remove parts if you think it's necessary.

scr4bble · October 2, 2020, 1:13pm

The logs are here. I removed the parts listing all the domains (all nginx config files - but they are identical in both cases).

Successful run https://pastebin.com/Z1PQrnAV
Failed run https://pastebin.com/b3WEuunD

Osiris · October 2, 2020, 1:21pm

Very strange. It should not randomly fail. The nginx configuration used looks the same to me between succes and failure. The IPs aren't by any chance load balancers just before your actual server?

scr4bble · October 2, 2020, 1:25pm

No, we don't have any load balancer here. Just a server with static IP.

Yes, that's what I was frustrated from after 3 hours debugging yesterday. I couldn't find any reason for it to stop working (I thought it might be in updating of nginx package or somethign similar that would indirectly break it, but then I saw it succeed and fail with no change in config so I decided to contact the letsencrypt community as you might have better experience with such problems).

Osiris · October 2, 2020, 1:26pm

Perhaps any difference in the nginx logs between a succesfull and failed run? Perhaps you can increase the verbosity of nginx logging temporarily.

rg305 · October 2, 2020, 6:03pm

scr4bble:

Just let me know which they are if I miss anything.

redirecting HTTP to HTTPS

# redirecting HTTP to HTTPS
server {
       listen 80 default_server;
       listen [::]:80 default_server;

       #server_name paxflow.is *.paxflow.is;

   #include snippets/cbs-location-restrictions.conf;

   #location / {
   #    return 301 https://$host$request_uri;
   #}
}

You are missing an action and a document root location in that block!
All lines are #'ed out.

If you don't need it - delete it.

rg305 · October 2, 2020, 6:13pm

Have you tried authenticating via

--webroot -w /srv/www/bustravel/api.bustravel.paxflow.io/public

scr4bble · October 2, 2020, 9:52pm

@rg305
the lines that are commented are usually in use. I commented them when trying to make the certbot command work becuase in failed certbot run the problem was that challenge request (/.well-known/acme-challenge/....) was redirected to https (it shouldn't have been). If you check the letsencrypt.log files I posted above (pastebin links) the block actually looks like this:

 server {
       listen 80 default_server;
       listen [::]:80 default_server;

       server_name xxxx;

   location / {
       return 301 https://$host$request_uri;
   }
}

As for webroot plugin - I guess that one would work but I haven't tried yet. I count on that as fallback solution if we don't manage to fix the behavior with nginx plugin - but thanks for the suggestion.

rg305 · October 2, 2020, 9:56pm

Webroot avoids all the modifications to nginx altogether and does the same thing (in the end).

scr4bble · October 2, 2020, 10:00pm

@Osiris I will try to adjust the nginx logging and check if I see anything wrong there (any suggestion what to look for or how to adjust the logging?)

@rg305 yes, I know - it is much simpler method and easy to fallback to but I don't like switching from something that is not working if I don't understand why

Osiris · October 2, 2020, 10:01pm

I'm afraid not.. This is a very strange issue you have I think..

scr4bble · October 3, 2020, 12:05am

@Osiris after certbot edits nginx configuration, do you know how does it reload the nginx afterwards. Could it be that it reloads the service asynchronously so if reloading takes longer, nginx doesn't manage to apply the changes fast enough? Or do you know anybody that could answer this? Would it be considered a bug?

Inspired by comment from @_az in this thread: Certbot renew with nginx module - returns error 404 for challenge response

_az · October 3, 2020, 12:18am

We added a flag for situations like that: --nginx-sleep-seconds (defaults to 1).

You can try bump it to 30 or something and see if it helps.

rg305 · October 3, 2020, 12:20am

Check the /etc/letsencrypt/letsencypt.log file.
If there is insufficient detail to answer your question, try it again with -v or -vv or -vvv

[each v would increase the amount of detail entered into the log file]

Osiris · October 3, 2020, 9:20am

It seems to run nginx -s reload:

github.com

certbot/certbot/blob/ef8c481634b642489c20b29d2a8c30526d5b5adf/certbot-nginx/certbot_nginx/_internal/configurator.py#L1180-L1182


      
          proc = subprocess.Popen([nginx_ctl, "-c", nginx_conf, "-s", "reload"],
                                  env=util.env_no_snap_for_external_calls(),
                                  stdout=out, stderr=out)

Good chance you're running into this asynchronous reloading issue I think, due to a lack of better explanation

I'd try the flag @_az has implemented if I were you and see if that helps!

A nice feature would be for certbot to only continue with the challenge if all previous worker processes have stopped.. But with a quick Google search, I'm unable to find if such a simple check exists.

Topic		Replies	Views
Unable to renew cert using nginx plugin, fresh cert creation succeeds Help	9	5967	November 8, 2019
Can't renew certificate nginx reverse-proxy Help	46	2719	June 14, 2024
Certbot Renew Challenge failed Help	8	344	July 13, 2024
Certbot --nginx success but certbot renew fail Help	9	3416	March 18, 2018
Certbot Renewal Fails Unauthorized Help	9	6493	December 5, 2021

Certbot fails to renew certificate using nginx plugin

redirecting HTTP to HTTPS

configuration file /etc/nginx/sites-enabled/api.bustravel.paxflow.io:

Related topics