Slow apache reload with 1,000s of certificates

We host a large number of certificates for separate client sites on a single server and are finding that when a new customer joins and we need to reload apache config to activate the certificate the server is unresponsive for around 30 seconds.

Has anyone else experienced this? The problem has definitely gotten worse with the more certificates we install.

My domain is: n/a

I ran this command: systemctl reload httpd

It produced this output: n/a

My web server is (include version):
Apache/2.4.6 (CentOS)

The operating system my web server runs on is (include version): Centos7

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don't know): Yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 1.11.0

1 Like

I can't help with Apache but I'm sure others are running thousands of certificates. What hardware or VM size are you running on?

CentOS 7 appears to have been released 8 years ago which perhaps implies the environment your sites are running in may not be optimal as it predates the explosive growth in SSL/TLS which we saw when Let's Encrypt started up and Google Chrome made https more or less mandatory.

You should also consider whether Apache is the current best fit for your requirements. Over the past few years it's started to fall out of favor to things like nginx and the newer Caddy server.

6 Likes

It's a fairly powerful VM 16cpus 32gigs ram.

You're absolutely right we are stuck on an old Centos version and locked to an older version of Apache. Which could be the problem..

We are looking at moving to HAProxy which apparently has zero downtime config reloads. This was under the advice of our hosting provider.. but I want to explore all options incase there is a more simple solution here to save us that infrastructure change.

If I hear of others with 1000s of certificates with super fast reloads that have zero impact of page load, that would definitely give me more confidence to explore the OS and Apache/Nginx updates instead.

2 Likes

That does sound good, assuming storage performance isn't a bottleneck (you could run a storage benchmark). Another option is to run some sort of profiler to examine where the process is spending it's time (cpu, disk etc), you may want to try that on a clone of the problematic VM.

4 Likes

Another thing to keep in mind is that I assume this is a "graceful" restart, so you are also waiting for all your current user http requests to complete [and it depends how fast their internet connection is, not yours]. If some sites are prone to bots or have large downloads that could be an issue,so the question is how fast is a non-graceful restart? You could perhaps configure apache with a shorter timeout as per Why httpd graceful restart takes such a long time? - Server Fault

6 Likes

Hey, yeah that is definitely a good question. The weird thing is that the actual graceful restart which is what "systemctl reload httpd" does happens fast.

But if you load the sites in a browser, this is what takes a long time. It's as thou apache service is busy reloading configs before it can respond.

I'll look into the timeout however and report back.

2 Likes

I've just been playing around with the timeouts.

I've set:
GracefulShutdownTimeout 1

This seems to actually improve things quite a lot.. it's not 100% snappy, maybe 5 seconds of lag trying to load a site right after.

My main Apache configs now are:
Timeout 60
KeepAlive On
KeepAliveTimeout 5
MaxKeepAliveRequests 0
GracefulShutdownTimeout 1

Might play with things a bit further.. maybe it's the keepalivetimeout...

2 Likes

It also depends what type of web sites you are serving e.g. an html site should load quickly vs a content management system (wordpress etc) as a dynamic CMS would need to load the application framework and database modules etc.

There could still be some other sort of first-time load issue but you'd have to measure where the bottleneck is (e.g. if cpu, storage. memory and network are all staying low then the problem is possibly a timeout while waiting for something else).

5 Likes

If @kingyrockets is running mod_php, I think this is a likely explanation. Under that mode, the PHP interpreter is embedded in each Apache worker, and once the worker exits under a graceful reload, all the speedup you get from the PHP code being cached is lost. So it's a worst case restart for the PHP applications basically.

Putting haproxy/nginx in front, or moving to PHP-FPM, would likely take care of this.

8 Likes

1000s of domains/certs/hosts on a single Apache sounds like it may be an anti-pattern to me. Maybe Apache has changed in the past 15 years, but that was something to avoid in the past - with all servers. Nginx should be able to handle this as of 2011 or so, but I don't think Apache built that out.

IIRC, a status-quo that emerged around 2008 for large configurations like yours was to partition the domains across multiple Apache/Lighttpd/Nginx configs. One webserver sat on port 80/443 to terminate the connection, and then proxypassed traffic upstream to the relevant webservers (running on higher ports). Sometimes they were partitioned by name, other times by client. Running a stripped down nginx was very popular for this, because it had an extremely low memory footprint.

IMHO, I would do both. Tossing nginx in-front should be a relatively fast trial. In my experience, mod_php can be problematic in a whitelabel/client situation for a variety of reasons and moving it to a separate process is best.

Another option is to dynamically load certificates on demand. You can do that with OpenResty, an nginx fork. I open sourced our implementation :

There may be a way to do this in Apache too, but I am unsure. I think you're still likely to deal with Apache issues from the sheer number of hosts and the mod_php behavior that @_az mentioned.

6 Likes

Thanks everyone for your suggestions and tips.

After some more testing setting GracefulShutdownTimeout 1 hasn't really made any improvement on the graceful reload.

I'm going to work on some proof of concepts using NGINX and PHP-FPM and see if we can easily create a migration pathway to this setup.

2 Likes