Best practice for use with clusters, load balancers, passive failover

There are various types of cluster, for example, active-active with multiple DNS entries or load balancers and also active-passive with floating IP or similar failover arrangements.

There are various discussions online about how to use Let’s Encrypt to generate a certificate and distribute it within a cluster.

It is generally best practice to generate the private key on a server and keep it there with very tightly controlled permissions. The private key is never sent to a CA, only a CSR is sent to a CA. However, in many of the examples for clustering, people suggest obtaining a single certificate and then using some scripts to replicate both the certificate and private key to all the other cluster nodes. Many of the replication strategies I’ve seen mentioned aren’t ideal for private keys, some of them actually transport the key un-encrypted over the network.

Does Let’s Encrypt encourage this approach, creating one key pair and replicating it, or is it intended that each node in a cluster should generate its own key pair and CSR?

In the case of replicating the private key and certificate, what methods are recommended?

In the case of creating different key pairs for each node, does Let’s Encrypt issue multiple certificates that are valid concurrently? How would the validation process work in such cases? For example, with dns-01 validation, would the client (e.g. certbot) need to implement some mechanism to serialize the requests to ensure that two or more nodes don’t try to create the TXT record at the same time?

I would - generally speaking - consider the replication approach a best-practice. In a typical HA cluster, the nodes are more or less identical, so the risk of key compromise is about the same for all members of the cluster, so there's not much of a compartmentalization argument to be made here. As long as your method of replicating the private key is sufficiently secure, I don't see a reason not to do this, whereas the other approach - one key/certificate per node - would both cause higher load for Let's Encrypt and carries the risk of letting you run into one of the rate limits if your node count is big enough.

I'd be comfortable with SSH/scp - odds are, you're relying on it for server administration anyway. If you're using some sort of configuration management software like Ansible, odds are it has some specific functionality for sharing secrets (like private keys) securely as well.

Rate limits aside, yes, you can issue mutliple certificates for the same domain name that are valid concurrently.

For dns-01, this is the algorithm for validating the TXT record:

  1. Compute the SHA-256 digest of the key authorization
  2. Query for TXT records under the validation domain name
  3. Verify that the contents of one of the TXT records matches the digest value

In other words, you can have multiple clients create a TXT record simultaneously, and the server will iterate through all of them to check if the one it's looking for exists.

For http-01, the Integration Guide suggests the following approach for HA clusters:

If you want to use the http-01 challenge anyhow, you may want to take advantage of HTTP redirects. You can set up each of your frontends to redirect /.well-known/acme-validation/XYZ to validation-server.example.com/XYZ for all XYZ. This delegates responsibility for issuance to validation-server, so you should protect that server well.

1 Like

I wasn't suggesting that there is a higher risk simply because more copies of the key exist. What I was getting at is that when you put in place a mechanism to transport private keys, that is more risky than an environment where the key never leaves the host where it was generated. I'm not suggesting that there isn't any satisfactory method to move the private keys around, but there will inevitably be some people who don't get it right.

Thanks for pointing that out. It would appear dns-01 is a better choice for this scenario though as multiple TXT records can exist concurrently and they don't have to worry about whether or not the validation requests hit the same server where the key was generated or whether the server is even active or in standby mode at the time of certificate renewal.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.