Multiple servers: what are all the options and requirements to get it working?

I’ve read a lot of threads here and on Stack Overflow to figure out how to do multiple-server Let’s Encrypt configuration. I’d like to know all the possible ways of doing this and more importantly, what the requirements to get it working are. Note that I’d like to avoid any automagic from Let’s Encrypt as I’d like to have it working for multiple configurations. I’ve decided to not go with the DNS challenge for security reasons and instead only do the Http-01 challenge.

The goal is to have multiple servers that can obtain a LE cert for non-wildcard domains, e.g. site.com and api.site.com.

Questions

  1. Does the DNS need to point to all backend servers? E.g. an A record with all the IPs of all the backend servers? I’m guessing no.
  2. Does the requesting server need to handle the challenge completely? I see a lot of solutions like “backend server requests renewal but proxies the Http-01 challenge to a special cert server”. But how does the cert server know how to answer the challenge? The docs say that the ACME client tells LE that it’s ready, but if you have separate clients on the backend and the cert servers, then the cert server is not ready?

Possible solutions

Correct me where I’m wrong and add other options where possible.

  1. Backend servers proxy to a central server.
    You have a special server, e.g. cert.site.com, that handles all the certificate requests, discussed here. Does the backend server here request the certificate renewal or does it only proxy the requests to the central server? Is the final certificate saved on the backend server or the central server?
  2. Have a central server manage everything. This means that one server does all the back-and-forth with Let’s Encrypt, gets all the certs, distributes the certificate to all the backend servers and triggers web server restarts on all the backend servers.
  3. Have only backend servers and they manage the certificates for themselves. Each server requests a cert, so might be problematic for LE’s rate limits.

The Http-01 challenge

The challenge docs say that the client puts a file at http://<YOUR_DOMAIN>/.well-known/acme-challenge/<TOKEN>. Obviously, the client doesn’t put it on http, but on disk. But does that mean that the challenge happens using a domain and not IP of the requesting server? How does this backend server make sure that all the servers on this domain can answer the challenge? I’m assuming there’s a private key of the requesting server at play, but other servers will have different private keys, so they can’t answer the challenge?

Hi @rokcarl

your setup is unknown. Please specify your configuration.

What’s a backend server? Normally, that’ a database server without a webserver.

Your domain has one or more ipv4 / ipv6 addresses.

Every frontend-webserver there must be able to answer /.well-known/acme-challenge/token-of-that-domain-name.

But you can use http / https redirects to another domain, so you need only one place to answer the challenge.

What’s a requesting server? The server where your ACME-client is running? That may be a server that is used as webserver. But it’s possible this is a completely different server.

??? The webserver must be able to send the required content. But it’s not required that there is a file on a disk.

??? Connecting a domain means: The programs resolves the domain name to find an ip address, then the ip address is connected, the domain name is sent in the http host header.

Looks like your “backend servers” are frontend webservers. If you have only one webserver per domain, that’s not a problem. If you have multiple webservers per domain, your configuration is required.

??? Why do you want to run different ACME-clients? Run only one ACME-client, then you have only one key pair.

1 Like

It might help to include a network diagram, as it’s not entirely clear where you are terminating SSL and whether you are spreading the SSL termination for any single domain across multiple servers.

i.e. Are you going to use DNS round-robin or are you intending to use a reverse proxy to handle SSL in one place?

In the latter case you would not bother distributing Let’s Encrypt certificates to your backends - that leg of the connection is invisible to the visitor, after all. If you still want encryption between the reverse proxy and your backends, you have other options (private PKI, VPN/Wireguard) that don’t require complicated ACME coordination.

1 Like

I didn’t specify my configuration precisely because it’s not set in stone and we might change it depending on our requirements or on how LE works. Additionally, we have a few different systems that we need to implement LE on. One might have a reverse proxy that would probably terminate SSL. Another would only proxy requests to backend servers and they would answer to SSL. Then we have another system where we use a reverse proxy that terminates SSL, but we also have a backup (stand-by) reverse proxy (in case the first one dies) and needs to be ready at all times, meaning that it needs to have the certs ready.

That’s why I need to know how LE technically works. I was talking about “backend” servers because this is how it was described in many threads. It’s the actual server that terminates SSL. It might be a reverse proxy, it might be an individual API server, it might be fronted by a reverse proxy that does not terminate SSL, etc. It’s the server that needs to have an SSL certificate installed.

I might be able to create a bunch of network diagrams for all the possible combinations, but it might make more sense to first discuss some details and then I can narrow down on possible solutions and discuss those.

My questions were probably not clear enough. So let’s start with how a certificate is requested and received using the Http-01 challenge. Reading the How It Works page I can see how the agent requests a certificate for a simple set up. For more complex configurations, things get trickier.

This solution proposes passing the challenge HTTP requests from server-A to a central validation server. But how does the solution to the challenge get to the validation server so that it ends up in /.well-known/acme-challenge/*?

The repsonse to the HTTP-01 challenge /.well-known/acme-challenge/<token> is just <token>.<ACME account thumbprint>.

This is handy because it allows you to respond to the challenge statelessly: the server requesting the certificate doesn’t need to tell the validation server how to respond.

The account thumbprint is static and known ahead-of-time, and the token is just in the request URL.

Have a look at https://github.com/acmesh-official/acme.sh/wiki/Stateless-Mode to see how this can be operationalized with nginx. (And of course, if not using nginx and lacking the ability to create these kind of responses inline, you can use a random PHP/CGI script to generate the response in the same way).

Wow, the stateless mode looks really easy to set up.

So you probably set up the stateless mode on the validation server and then all the other servers that need certificates proxy all the HTTP requests for /.well-known/acme-challenge/* to the validation server.

Let me see if I understand this fully. Let’s say we want api.site.com on server-a and server-b. We also have a validation server. We do the following:

  • set the api.site.com to point to server-a and server-b,
  • configure server-a and server-b to proxy HTTP requests for /.well-known/acme-challenge/* to the validation server,
  • register an account and make sure that the validation server’s web server has the thumbprint.

I’m unsure about the last point. I’m thinking that when you register, you get a thumbprint and a corresponding private key, that the ACME client needs to use each time a new cert is requested. So if I register an account on server-a and server-b, I’ll get two separate thumbprints and the validation server wouldn’t know which one to use when. So there must be a way to register once. Should I register on server-a and then distribute (copy/paste) the private and other keys to server-b? It would be ideal if private keys would not flow on the network.

You create only one account with an ACME-client, that runs on the validation server. So you have only one (unique) thumbprint.

Then you have to distribute the keys from the validation server to server-a and server-b for their ACME client?

You don’t run two (or more) ACME-clients on server-a, server-b etc.

You run only one ACME-client on your validation server. That client (or a programm running after creating the certificate) distributes the certificate.

You can run the ACME clients wherever you want, just pre-distribute the ACME private key to each location as required.

Or you can get the validation server to map domain name => thumbprint. Whatever.

Yes, this is the “validation server does everything, worker servers just proxy requests to it” approach, the no. 2 from the Possible solutions section in my question. Now we’re getting to the crux of the thread: “Multiple servers: what are all the options and requirements to get it working?”

Would it make sense to describe the approaches based on where the certification request is originating and what holds the answers?

Origin validation, answer validation

In this case, the validation server is the one that initiates the certificate issuance and is the one holding the private keys and the thumbprint. If I understand correctly, all the worker servers (server-a and server-b, the ones that need the certificate) would proxy the challenge to the validation server. The domain name (api.site.com) we’re requesting could point to one or all the worker servers. We would register an LE account on the validation server, set up cron on the validation server. Then we would need to distribute site certificates to the worker servers. Is this correct? Can we avoid sending private keys over the network?

Origin worker, answer validation

Is this even possible? The worker server (e.g. server-a) would have an ACME client installed, it would be registered (does every worker server need to register separately) with LE, but we would save the thumbprint to the validation server, we would proxy all the Http-01 challenges to it. In this case we would either need to configure the validation server with all the worker thumbprints (not a scalable solution) or have one thumbprint only, which means distributing the registration keys to all the worker servers (probably not good either as it sends the private key over the network).

Origin worker, answer worker

In this case, the workers would request a certificate and also answer the challenges. What do we need to make this one work? I think it’s not even possible to have each worker server with separate registration, as the LE challenge could hit any one of the worker servers? We could have only one thumbprint, but we would need to distribute the private key and the thumbprint to all the worker servers, correct?

For what it’s worth, it might be possible to set up DNS validation in a way that satisfies your security requirements. It might be complicated, but if you’re thinking of doing something complicated anyway…

I’ve seen the possibilities for this, e.g. having a validation domain, CNAME from wanted domain to the validation domain, storing the API keys for this domain only. Our current setup makes this something that I’d like to avoid. But if other approaches turn out to have a lot of downsides as well, I’ll think about this one.

From my personal viewpoint, I would say all three approaches are possible.

It’s indeed much easier to use this method to get a certificate since you can effectively avoid rate limit related issues. You also have an option to reuse private key, so you don’t necessarily need to distribute the key to every server once your certificate renewed. (However, it’s not suggested to re-use the key)

This is also possible, but you’ll need to distribute the challenge token the origin worker created to your validation server and depends on number of servers you have, you might hit rate limit rather quickly. (Registered domain limit and duplicate certificate limit).

This seems like the hardest one. Since when Let’s Encrypt connect to your domain for validation, you’ll need to either make sure the servers (multiple) connect to the exact worker or distribute the file to every worker that serves the website. It’s also hard to avoid the rate limit in this case.

My (personal) position: (2) and (3) are bad.

If you have multiple ACME clients to create certificates with the same set of domain names, it’s too complicated. You may hit the limit if there is an error.

Two options:

  • Create an additional public / private key per server-a, server-b etc. The V-Server knows the public key and uses it to encrypt the private key of the certificate key. The server-a knows the private key to decrypt that message
  • There are clients possible with a re-use key option. Then you can use the same certificate key one year, then create a new and distribute it.

I think it’s clear now that having multiple sets of keys for validation would be bad, primarily because of rate-limiting.

Create an additional public / private key per server-a, server-b etc. The V-Server knows the public key and uses it to encrypt the private key of the certificate key. The server-a knows the private key to decrypt that message

This requires custom software to facilitate the certificate creation and sending the certificates over to worker servers, right? Or is there something that already does that?

Two identical servers: master and standby

What are the options in this scenario? Say we have two identical servers that are reverse proxies for backend services and they terminate SSL. But at any one point, only one is the master and the DNS A record points to the master. The standby should be always ready, so when you do a failover, it should already have the certificate for the api.site.com domain.

Since the standby server is not in the DNS, it can’t answer to challenges. But it could initiate a certificate renewal request. Can this be done without introducing another server to the infrastructure? The only possible solution for this that I see is if both servers have an ACME client installed (like acme.sh), but only one account is registered with LE and the private key and the thumbprint is distributed to both servers? Any better solution iff you want only those two servers? If not, I guess I’ll have to go with a separate validation server.

If it’s connected to internet (and able to reach let’s encrypt API endpoints), it will be able to initiate the certificate request.

Absolutely. One account key can be used in different servers.

P.S. You also need to configure the server so it’ll serve the correct file / content when Let’s Encrypt hits.

If you only want two certificate, there should be no issue with rate limit.

Based on what you said, I can do the following to get a master and a standby ready?

  • Install an ACME client (e.g. acme.sh) on both the master and the standby,
  • register an account on the master,
  • copy the keys and the thumbprint from the master to the standby,
  • set both servers so they can answer the Http-01 challenge with the thumbprint,
  • set up the cron job on both to renew the certs, using the copied account keys,
  • when we need to switch from master to standby, I can change the DNS records and the standby is ready to serve traffic to https://api.site.com.

CertMagic, a Go library I wrote (stating for disclosure purposes), supports coordinated certificate management across a cluster: https://github.com/mholt/certmagic/#behind-a-load-balancer-or-in-a-cluster

Caddy, a web server I wrote (stating for disclosure purposes), uses CertMagic, so you can set up a fleet of Caddy instances behind a load balancer and they will automatically coordinate cert management as long as they’re configured with the same storage backend (e.g. same folder, or same database, or whatever): https://caddyserver.com/docs/automatic-https#storage

Or you can use it as a load balancer / reverse proxy in the front. Either way. When acting as a cluster, only one Caddy instance will renew a cert, and all instances will share the certificate. It’s really slick, and “just works” by setting the same storage configuration.

Nice, I’ve used Caddy on another project. And this solution really looks like magic, awesome! I would need to find a way to share storage though and replace Nginx with Caddy. I might do the first, but currently not the second, unfortunately. Hmmm, maybe :smile: