Problem with ssl_stapling with certbot

Hello,

I've came accros a strange behavior sregarding the ssl_stapling configuration with nginx authenticator.
It almost drived me crazy for about a day.

Here is what happenend:
I installed a fresh new configuration (debian/nginx/mysql/php/letsencrypt), started to issue some certificates successfully, migrated websites, everything went smoothly.
But then, I decided to hardened my nginx configurations, restricting ssl protocols, ssl sessions, etc.
At that point, nginx was hardened, A+ grade for ssl verifications, everything was running great.

Then, adding a new website became a real pain in the ***
I configured everything, migrated, and then come the moment: lets issue a certificate. Http challenges fails with 404. Strange. I try to list certificates, try do dry-run a renew...boum: 404 error, none of the certificates will be renewed.
So I came back to basics, trying to remove and reinsert part of the nginx configuration, until I realized that the directive causing all this was.... ssl_stapling.

What am I missing ?
Is it a regular behavior with ssl_stapling ?

Many thanks !

Nginx.conf

user                 www-data;
pid                  /run/nginx.pid;
worker_processes     auto;
worker_rlimit_nofile 65535;

events {
    multi_accept       on;
    worker_connections 65535;
}

http {
    upstream php {
        server unix:/var/run/php/php7.4-fpm.sock;
        include upstreams/*.conf;
        keepalive 10; 
    }

    charset              utf-8;
    sendfile             on;
    tcp_nopush           on;
    tcp_nodelay          on;
    server_tokens        off;
    types_hash_max_size  2048;

    # MIME
    include              mime.types;
    # default_type         application/octet-stream;

    # Logging
    access_log           /var/log/nginx/access.log;
    error_log            /var/log/nginx/error.log;

    # SSL
    ssl_session_timeout  1d;
    ssl_session_cache    shared:MozSSL:10m;
    ssl_session_tickets  off;

    # Diffie-Hellman parameter for DHE ciphersuites
    ssl_dhparam          /etc/nginx/dhparam.pem;

    # Mozilla Intermediate configuration
    ssl_protocols        TLSv1.2 TLSv1.3;
    ssl_ciphers          ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;


    # OCSP Stapling
    ssl_stapling         off;
    ssl_stapling_verify  on;
    resolver             1.1.1.1 1.0.0.1 8.8.8.8 8.8.4.4 208.67.222.222 208.67.220.220 valid=60s;
    resolver_timeout     2s;

    # Buffer policy
    client_body_buffer_size 1k;
    client_header_buffer_size 1k;
    client_max_body_size 2k;
    large_client_header_buffers 2 1k;

    server {
        listen 80 default_server;
        listen [::]:80 default_server;
        server_name _;

        root        /usr/share/nginx/html;

        location / {
            return 301 https://$host$request_uri;
        }
    }	

    # Load configs
    include              /etc/nginx/conf.d/*.conf;
    include              /etc/nginx/sites-enabled/*;
}

Command run

certbot --dry-run --nginx --cert-name=my-domain.com renew

Response when ssl_stapling: on


Processing /etc/letsencrypt/renewal/my-domain.com.conf


Cert not due for renewal, but simulating renewal for dry run
Plugins selected: Authenticator nginx, Installer nginx
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for my-domain.com
Waiting for verification...
Challenge failed for domain my-domain.com
http-01 challenge for my-domain.com
Cleaning up challenges
Attempting to renew cert (my-domain.com) from /etc/letsencrypt/renewal/my-domain.com.conf produced an unexpected error: Some challenges have failed.. Skipping.
All renewal attempts failed. The following certs could not be renewed:
/etc/letsencrypt/live/my-domain.com/fullchain.pem (failure)


** DRY RUN: simulating 'certbot renew' close to cert expiry
** (The test certificates below have not been saved.)

All renewal attempts failed. The following certs could not be renewed:
/etc/letsencrypt/live/my-domain.com/fullchain.pem (failure)
** DRY RUN: simulating 'certbot renew' close to cert expiry
** (The test certificates above have not been saved.)


1 renew failure(s), 0 parse failure(s)

IMPORTANT NOTES:

Response when ssl_stapling: off

Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/my-domain.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert not due for renewal, but simulating renewal for dry run
Plugins selected: Authenticator nginx, Installer nginx
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for my-domain.com
Waiting for verification...
Cleaning up challenges

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
new certificate deployed with reload of nginx server; fullchain is
/etc/letsencrypt/live/my-domain.com/fullchain.pem
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates below have not been saved.)

Congratulations, all renewals succeeded. The following certs have been renewed:
  /etc/letsencrypt/live/my-domain.com/fullchain.pem (success)
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates above have not been saved.)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2 Likes

Welcome to the Let's Encrypt Community, Raphaël :slightly_smiling_face:

Try appending --staple-ocsp and --must-staple to your certbot command.

If you’re using OCSP stapling with Nginx >= 1.3.7, chain.pem should be provided as the ssl_trusted_certificate to validate OCSP responses.

2 Likes

Stapling shouldn’t make any difference to the authenticator - it’s purely a certificate installation concern. In particular, you do not want to enable Must Staple blindly.

Some follow up questions:

  • Certbot version?
  • How many nginx virtual hosts on this server?
3 Likes

Hi Griffin, Hi _az,

Thanks for your quick reply and help.

I have ~20 domains and subdomains on this server at the moment.
I have certbot 1.9.0 via snap (I followed the recommendations here).

As far as I understand it, I don’t get why ssl stapling applies here, like you said _az, it shouldn’t be of any concern.
Btw, at that time I also tried to create a new certificate, and still, same error. So nothing maybe related to actual ssl known by the challenge server.

Any other clues?

2 Likes

Thanks! Could please try and see whether this helps (without disabling stapling):

certbot renew --cert-name my-domain.com --dry-run --nginx-sleep-seconds 15

It'll take ~30 seconds longer to run, but if my theory is right, it should help.

2 Likes

Hi _az,

Your theory is right. I managed to get down to this without errors for the record:
certbot --dry-run --nginx --cert-name=my-domain.com --nginx-sleep-seconds 5 renew

So the question rises: why ?
Because reloading nginx's configuration is quiet done in no time, so I don't get why it is necessary to defer its reload ?
From the source code here:

--nginx-sleep-seconds NGINX_SLEEP_SECONDS
Number of seconds to wait for nginx configuration changes to apply when reloading. (default: 1)

What am I missing ?
Thanks though !

1 Like

While nginx -s reload does appear to be instantaneous, it isn't. All it does is kick off the reload process, which runs asynchronously.

It turns out that the process to parse the configuration and re-fork all of the workers doesn't scale all that well. If you have more than a handful of virtual hosts and SSL certificates, the total time begins to exceed the 1 second that Certbot waits by default. That's why I asked you how many virtual hosts you have.

So we added this flag as a sort of last resort to give a workaround to users who hit this problem.

(This bit is speculative). By enabling OCSP Stapling, when nginx parses the configuration, it has to then look at each certificate, find the AIA OCSP URL and setup OCSP resolvers for each one. Perhaps this extra work is what causes your nginx server to miss that 1 second deadline. Or it could have nothing to do with stapling and it's just random luck based on how fast your server is.

2 Likes

I get it, Big Thanks !
I modified the cron task also, to add this important parameter.
I also take in mind that with domains/subdomains growing on the machine, maybe I'll have to increase that time.

It's true that I completely get out of my head the async aspect of the process, while the command of reloading the config seems instantaneous.

Maybe I'll take some time to try to benchmark nginx startup times with a few vhosts and then quiet more, to determine whether the OCSP stapling is taking that much time. I also assume it can be influenced by the response time of the endpoint authority, so your speculations seems legit.

Anyway, thanks !
Have a great day everybody.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.