SSL Stapling Sometimes Fails on Nginx

I’ve added SSL stapling to one of the virtual hosts served by my Nginx server adding (domain blanked):

server {
  ...
  ssl_certificate_key /etc/letsencrypt/live/.../privkey.pem;
  ssl_certificate /etc/letsencrypt/live/.../fullchain.pem;
  ssl_trusted_certificate /etc/letsencrypt/live/.../chain.pem;
  ssl_stapling on;
  ssl_stapling_verify on;

Reload, check on SSL Labs, all green, nice!

After adding the same to the remaining 13 virtual hosts, however, Nginx started to complain:

nginx: [warn] “ssl_stapling” ignored, host not found in OCSP responder “ocsp.int-x3.letsencrypt.org” in the certificate “/etc/letsencrypt/live/…/fullchain.pem”

This warning is repeated 4 to 5 times for different domains in different orders.

I’ve tried to nail this by setting the local DNS on Nginx:

http {
  ...
  resolver 213.133.98.98 213.133.99.99 213.133.100.100 valid=30s;

No change there, the warnings still appear.

:warning: When I restart Nginx twice e.g. with /etc/init.d/nginx restart; /etc/init.d/nginx restart, the second restart issues the warning usually for each and every virtual host!

I’ve checked whether the local DNS blocked too many resolves by hitting dig ocsp.int-x3.letsencrypt.org 100 times in a row. The first are answered immediately, then a delay of maybe 1 second is added between responses, nothing that should cause the Nginx resolver to trip since the default timeout is 30s.

Any idea what else I could try? Thankx a bunch!

It probably isn’t relevant to this, but you shouldn’t set the valid parameter. Let it use the TTL.

What does dig show when it’s delayed? A lengthy “Query time”? Or not?

By default, dig's “try the next nameserver” interval is 1 second. It sounds like your resolvers may sometimes be failing to respond in a reasonable time period or at all.

Maybe you should contact the resolver operator and ask if they have rate limits, problems in general, or problems querying zones operated by Akamai or Cloudflare in particular (or Afilias, PCH or the roots).

You might be able to reduce or eliminate the problem by installing a local DNS cache, and maybe switching to a more reliable resolver.

1 Like

I would do it in the following way.

Replace domain.com with your domain. It will forward all traffic to https.

server {
   listen 80;
   server_name www.domain.com domain.com;
   return 301 https://domain.com$request_uri;
}

This will then forward a www.domain.com to domain.com as https

server {
    listen 443 ssl;
    server_name www.domain.com;

    ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem;
    return 301 https://domain.com$request_uri;

    ssl_session_timeout 1d;
    ssl_session_cache shared:SSL:30m;
    ssl_session_tickets off;

    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:ECDHE-RSA-AES128-GCM-SHA256:AES256+EECDH:DHE-RSA-AES128-GCM-SHA256:AES256+EDH:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4";

 ssl_prefer_server_ciphers on;
 ssl_stapling on;
 ssl_stapling_verify on;
 resolver 8.8.8.8 8.8.4.4 valid=300s;
 resolver_timeout 10s;

}

Then this for the actual server

server {
    listen 443 ssl http2;
    server_name domain.com;
    ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem;
    ssl_session_timeout 1d; 
    ssl_session_cache shared:SSL:30m;
    ssl_session_tickets off;

   ssl_protocols TLSv1.2 TLSv1.3;
   ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:ECDHE-RSA-AES128-GCM-SHA256:AES256+EECDH:DHE-RSA-AES128-GCM-SHA256:AES256+EDH:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4";
    ssl_prefer_server_ciphers on;
   ssl_stapling on;
   ssl_stapling_verify on;
   resolver 8.8.8.8 8.8.4.4 valid=300s;
   resolver_timeout 10s;
}

Just to pick out one line…

Don’t set valid. Let Nginx use the TTL that the CDN put in their DNS records.

And it’s not recommended to use a remote DNS server because Nginx’s stub resolver isn’t securely designed. Though it’s unlikely anyone would try to poison your OCSP DNS records, and it’s probably not very harmful if they did, since ssl_stapling_verified is on.

Figured it out: Series of dig always blocked after exactly 48 queries, no matter whether I nailed it to one specific or any of the three local DNS. Turns out, the firewall applies the UDP per second limit for both incoming and outgoing. Increasing the burst rate a little fixes the issue.

However, this is more of a workaround than an actual fix.

A local DNS cache would certainly help, but appears to be an overkill and last time I had such a cache installed on a server, it failed quite often. And why add another cache, isn’t Nginx already caching the lookups?

By default, nginx caches answers using the TTL value of a response. An optional valid parameter allows overriding it:

Even with the custom 30s, shouldn’t the nameservers be hit exactly once instread of apparently close to 60 times?