Renew ssl stop working after adding thousands of sites on Nginx server

I have 3 Nginx servers as reverse proxy for thousands of domains(only some domains have SSL). The 3 servers are identical in server specs and configurations.

The certbot install and renew SSL without issue when there’s less than 4000 sites. After I added 8000 more sites to one server, certbot won’t be able to install new ssl or renew existing ssl. Everyday when certbot renew runs, it runs for 6 to 8 mins and using 100% CPU. At end, it fails with 404 error.

I’ve managed to reduce the total sites from 12,000 to 8000, the renew still fail but there’s a little chance one site renewal might success.

I am wondering anyone else experiencing same issue and may share the solution. Thank you for any input!

Update: There’s only 18 domains on this server has SSL.

My domain is: exampledomain.com

I ran this command: certbot --nginx certonly -d exampledomain.com

It produced this output:

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator nginx, Installer nginx
Starting new HTTPS connection (1): acme-v02.api.letsencrypt.org
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for exampledomain.com
Waiting for verification…
Cleaning up challenges
Failed authorization procedure. exampledomain.com (http-01): urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization :: Invalid response from https://exampledomain.com/.well-known/acme-challenge/RRZ13ty2JMw7iDzi22d-V47D0Nyl_vhqA7jFB1ryi7M [64.xx.xx.12]: “\r\n404 Not Found\r\n\r\n

404 Not Found

\r\n
nginx\r\n”

IMPORTANT NOTES:

My web server is (include version): nginx version: nginx/1.15.9

The operating system my web server runs on is (include version): CentOS Linux release 7.6.1810 (Core)

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): no control panel, this is a reverse proxy server.

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): certbot 0.30.2 (just upgraded to certbot 0.34.2, same issue)

I’m speculating, but:

In order to validate, the Certbot Nginx authenticator has to parse the Nginx configuration, edit it, reload Nginx, wait a second, and tell Let’s Encrypt to attempt the validation…

The bigger the configuration, the longer it takes for Certbot to parse it.

Worse, it also takes longer for Nginx to parse it. It’s possible that Nginx is not reloading quickly enough, and the challenge only gets set up a few moments after Let’s Encrypt tries to validate it.

Can you compare /var/log/letsencrypt/letsencrypt.log and Nginx’s error.log to see if that seems to be happening?

The Nginx reload usually take around 10 seconds or less. But when renew ssl, it doesn't look like need to update the domain conf file.

(The error saying "No such file or directory", this is a reverse proxy server, the .well-known directory does not exist.)

Here's the Nginx error log:

2019/07/08 17:22:19 [error] 27681#27681: *46229852 open() "/srv/www/exampledomain.com/.well-known/acme-challenge/dFzHdeyD7bb6BQgHZb3-Q4aUh1eeaMymlMcog1cYNm0" failed (2: No such file or directory), client: 66.xx.xx36, server: exampledomain.com, request: "GET /.well-known/acme-challenge/dFzHdeyD7bb6BQgHZb3-Q4aUh1eeaMymlMcog1cYNm0 HTTP/1.1", host: "exampledomain.com", referrer: "http://exampledomain.com/.well-known/acme-challenge/dFzHdeyD7bb6BQgHZb3-Q4aUh1eeaMymlMcog1cYNm0"
2019/07/08 17:22:19 [error] 27680#27680: *46229853 open() "/srv/www/exampledomain.com/.well-known/acme-challenge/6mFWzqO6o4UQjDJIwpJzOHp-5SI2TBJcUh3pgZ-pOBU" failed (2: No such file or directory), client: 66.xx.xx.36, server: exampledomain.com, request: "GET /.well-known/acme-challenge/6mFWzqO6o4UQjDJIwpJzOHp-5SI2TBJcUh3pgZ-pOBU HTTP/1.1", host: "www.exampledomain.com", referrer: "http://www.exampledomain.com/.well-known/acme-challenge/6mFWzqO6o4UQjDJIwpJzOHp-5SI2TBJcUh3pgZ-pOBU"

Blockquote

Here's the /var/log/letsencrypt/letsencrypt.log

Blockquote

2019-07-08 17:22:20,767:DEBUG:certbot.error_handler:Encountered exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/certbot/auth_handler.py", line 90, in handle_authorizations
self._poll_authorizations(authzrs, max_retries, best_effort)
File "/usr/lib/python2.7/site-packages/certbot/auth_handler.py", line 154, in _poll_authorizations
raise errors.AuthorizationError('Some challenges have failed.')
AuthorizationError: Some challenges have failed.

2019-07-08 17:22:20,824:DEBUG:certbot.error_handler:Calling registered functions
2019-07-08 17:22:20,824:INFO:certbot.auth_handler:Cleaning up challenges
2019-07-08 17:25:00,468:WARNING:certbot.renewal:Attempting to renew cert (exampledomain.com) from /etc/letsencrypt/renewal/exampledomain.com.conf produced an unexpected error: Some challenges have failed.. Skipping.
2019-07-08 17:25:00,469:DEBUG:certbot.renewal:Traceback was:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/certbot/renewal.py", line 449, in handle_renewal_request
main.renew_cert(lineage_config, plugins, renewal_candidate)
File "/usr/lib/python2.7/site-packages/certbot/main.py", line 1205, in renew_cert
renewed_lineage = _get_and_save_cert(le_client, config, lineage=lineage)
File "/usr/lib/python2.7/site-packages/certbot/main.py", line 115, in _get_and_save_cert
renewal.renew_cert(config, domains, le_client, lineage)
File "/usr/lib/python2.7/site-packages/certbot/renewal.py", line 307, in renew_cert
new_cert, new_chain, new_key, _ = le_client.obtain_certificate(domains, new_key)
File "/usr/lib/python2.7/site-packages/certbot/client.py", line 349, in obtain_certificate
orderr = self._get_order_and_authorizations(csr.data, self.config.allow_subset_of_names)
File "/usr/lib/python2.7/site-packages/certbot/client.py", line 385, in _get_order_and_authorizations
authzr = self.auth_handler.handle_authorizations(orderr, best_effort)
File "/usr/lib/python2.7/site-packages/certbot/auth_handler.py", line 90, in handle_authorizations
self._poll_authorizations(authzrs, max_retries, best_effort)
File "/usr/lib/python2.7/site-packages/certbot/auth_handler.py", line 154, in _poll_authorizations
raise errors.AuthorizationError('Some challenges have failed.')
AuthorizationError: Some challenges have failed.

2019-07-08 17:25:00,469:ERROR:certbot.renewal:All renewal attempts failed. The following certs could not be renewed:
2019-07-08 17:25:00,469:ERROR:certbot.renewal: /etc/letsencrypt/live/exampledomain.com/fullchain.pem (failure)
/etc/letsencrypt/live/exampledomain.us/fullchain.pem (failure)
/etc/letsencrypt/live/exampledomain.com/fullchain.pem (failure)
2019-07-08 17:25:00,473:DEBUG:certbot.log:Exiting abnormally:
Traceback (most recent call last):
File "/bin/certbot", line 9, in
load_entry_point('certbot==0.34.2', 'console_scripts', 'certbot')()
File "/usr/lib/python2.7/site-packages/certbot/main.py", line 1379, in main
return config.func(config, plugins)
File "/usr/lib/python2.7/site-packages/certbot/main.py", line 1284, in renew
renewal.handle_renewal_request(config)
File "/usr/lib/python2.7/site-packages/certbot/renewal.py", line 474, in handle_renewal_request
len(renew_failures), len(parse_failures)))
Error: 3 renew failure(s), 0 parse failure(s)

Blockquote

I’ve seen this complaint a few times so I’ll put in a high effort response:

I’m going to outline an approach you can use with servers that have thousands of virtual hosts. In this case, I’ve tested it with 50,000 nginx virtual hosts (generated with this program).

As you can probably guess, the pyparser library that Certbot’s nginx plugin comes with is extremely CPU-bound and takes a very, very long time.

What this approach comes down to is being completely stateless. That means, absolutely no reading or reloading of nginx configuration (other than to reload nginx at the end) and no writing to the filesystem to do authentication.

Here is what such a virtual host looks like:


server {
	listen 80;
	listen 443 ssl http2;

	server_name 1.example.com;

	ssl_certificate /etc/letsencrypt/live/$ssl_server_name/fullchain.pem;
	ssl_certificate_key /etc/letsencrypt/live/$ssl_server_name/privkey.pem;

	location / {
		root /var/www/1.example.com;
	}
	
	location ~ ^/\.well-known/acme-challenge/([-_a-zA-Z0-9]+)$ {
		default_type text/plain;
		return 200 "$1.EyX9GyDfX9VnzQ008iBReYPdHmwVz51rMnPAToYNYL8";
	}
}

Two main things:

  • SSL certificates and keys are loaded via nginx variables. This allows them to be loaded optimistically (if they are present), otherwise, the site is not available over HTTPS. This requires nginx >=1.15.9.
  • We are responding inline to ACME challenge requests. This is possible due to the design of HTTP-01. You can determine your thumbprint from your private_key.json in /etc/letsencrypt/accounts/. It never changes for a single ACME registration. (I used this program to derive the thumbprint).

The combination of those two factors means we can issue new certificates without touching nginx at all.

And we have 50,000 of those, one for each virtualhost. Other parts of the config can vary as required for per-site customization.

We then change our Certbot invocation to this:

certbot certonly -d 1.example.com --manual --manual-public-ip-logging-ok \
--manual-auth-hook "/bin/true" --manual-cleanup-hook "/bin/true" \
--post-hook "service nginx reload"

To explain:

  • We use certonly because we don’t want Certbot to interact with nginx to install anything
  • We have no-op manual authentication hooks because our authentication is stateless already
  • We reload nginx at the end because we want our ssl_certificate variable to pick up the new certificates we have created

When we run this, rather than stalling for an eternity, it performs the issuance process instantaneously, and nginx’s reload is pretty efficient, in consideration of the number of virtual hosts.

It also means you can perform a big dry-run (within rate limits) without blowing up your CPU, which is nice. I really have to emphasize how much better and faster this is than trying to use Certbot’s nginx plugin. It is really, really fast and reliable.

An exercise left to the reader: dealing with multiple domains in a single vhost … but as long as you order the primary domain as the first one, consistently, it should work fine (but you would need to replace $ssl_server_name with just $server_name).

1 Like

The authenticator temporarily edits the Nginx configuration so that the challenge in /.well-known/acme-challenge/ exists, then reverts the change afterwards. There aren’t any permanent changes, but there are critically important temporary changes.

Validation can fail if Certbot doesn’t understand the configuration well enough to edit it correctly, or if reloading Nginx takes too long.

@_az

I was going to suggest using the webroot authenticator and continuing to use the Nginx installer. :smile: Though I’d like to know what’s wrong first – if it’s just due to an Nginx configuration problem like duplicate virtual hosts, and the Ngixn authenticator isn’t too slow, might as well keep using it.

AFAIK the nginx installer still ends up spinning 100% CPU for ages, the same as the authenticator.

Ah. :slightly_frowning_face: My hope’s that the speed of creating new certificates isn’t very important, and renewing them is fast.

Thanks a lot for all your help. I will try the method _az is using which look like will solve my problem. :smiley:

BTW: I should ask this question earlier before moving 4K sites to another server.

Hi @garconcn

you use certonly. So your certificate isn't installed, so no installer is used. So I would try to use --webroot one time.

There was another thread (I think, Apache), not the same configuration. But with a large number of vHosts and the same problem. Switch to --webroot had helped.

1 Like

Thanks. Use --webroot does create certificate instantly.

1 Like

Thanks. Good to know :+1:

I replaced --nginx with --webroot in my script to obtain the certificate only, this is much faster than before.

Is there a way to renew the certificate only but not install it? Reading the user guide but not find yet.

I use "certbot renew" to renew the certificate.

1 Like

You can edit the config files.

Check the config file of the domain you have updated.

Should have something like

authenticator = webroot
webroot-path = yourPath
1 Like

Great! I will give it a try. Thanks a lot!

1 Like

Don’t forget that Nginx needs to be reloaded after you renew a certificate.

(Unless you’re doing something customized.)

The Certbot Nginx installer automatically reloads it, but if you’re using certbot certonly, you need to add a deploy hook to run service nginx reload or something.

Thanks. I run a cronjob to reload nginx in 15 mins interval if any configuration changes to prevent nginx reload too frequent.

1 Like

Sounds good. :smiley: I do something similar.

1 Like

The issue has been resolved. I switched plugin from “nginx” to “webroot” to install and renew ssl, this is easier and quicker fix. Thank you for all your help and advice.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.