I help run an IRC network, and, with the fact that we have multiple servers, we use round robin at the DNS level to assign people to servers. We use letsencrypt for the ssl, and have the following issues:
When creating a certificate, it is extremely hard to get a certificate created, because seemingly 19/20 times, letsencrypt hits the wrong server in the round robin. I don’t know if this has to do with caching or what, but we only have about 6 servers or so, and I know there could be clients with much more, so this could be an even bigger issue.
The same happens when renewing, since it has to re-authenticate, the cert, more often than not, fails to renew properly. This makes it hard to do automatically, for example in a cron.
Ever since the rate limits were put in for failed renewals/creations, it has become even harder. Now it often takes me a good 3 or 4 hours’ worth of tries to get it done.
I would propose a solution as follows:
If the certificate contains a name that has only one IP address, only use that IP address for checking, unless --allow-subset-of-names is set. This would make it so, if making or renewing a certificate for irc.example.com and servera.example.com, it would always go to the right server, assuming servera.example.com wasn’t also round robin.
If the dns contains multiple IP’s for all names and/or the --allow-subset-of-names is set with a dns entry with one IP, it may be helpful to try all IP’s in the dns. I know this would add an additional workload, but it’s the only way I can think of to properly handle round robin.
Just my thoughts, and perhaps we can have a little discussion to flesh it out a little more. I just know that, with rounb robin – a fully supported feature of DNS – and letsencrypt using web verification by default, this really needs to be addressed.
I take it from your description that you are currently validating with either http-01 or tls-sni-01 challenges, and they of course connect to any of the servers in the round-robin group, so to carry this out successfully you’d need to arrange for the validation to pass regardless of which server answers. Further I gather than since you wish the certificates for IRC, it doesn’t matter to you how exactly (or if at all) any web servers are set up with these names.
I commend two approaches to people in your situation:
DNS validation. Arrange to prove control over the names via DNS itself, using dns-01 instead of http-01/ tls-sni-01 challenges.
Use of a proxy or 30x redirect to arrange for answers to be collected from elsewhere when the “wrong” server is asked for validation. With http-01 if you set all the other machines in the network to either proxy the answer from the one with the true answer, or HTTP 30x redirect the Let’s Encrypt querier to the true answer, it will pass.
I suspect (though I don’t speak for them) Let’s Encrypt won’t be content with your approach because it weakens the already rather flimsy proof of control from these challenges. Hence commending the above alternatives which should work for your situation.
The dns challenge could work if all the servers were run by the same person/organization, but each server, except for a few, are run by different people and, while I haven’t asked, I doubt the owner of the domain would like each server having access to their dns, especially because the way it’s set up the dns could be changed on any of their domains, not just the IRC dns.
As for redirecting to the proper server, any suggestions how this could be done? Given that the web server in question wouldn’t know which server needed the answer to be proxied, and the server requesting the certificate wouldn’t know which server would receive the challenge to send the file, I’m not sure how this would work.
Pick you favorite server out of the cluster, in this case one of the ones you control. Call it server A.
Set up certbot on it.
On all other servers, set up a proxy connection on the web server software (proxy_pass in nginx) to pass any requests for the acme challenge directory to server A.
Use Certbot on server A to issue certificates.
Now the only issue here is that only server A can run certbot. I think the best option is to have server A copy it’s certificate to the other servers as well. This would, of course, require more setup and trust from the other operators, but it’s the best way d can see.
Well, you can use a different DNS provider with more granular access control.
You can move the whole domain, or simply CNAME or delegate the _acme-challenge record(s).
It doesn't even have to be a very advanced or reliable service. You could pick one server, install BIND, generate and distribute TSIG keys to each IRC server, and use nsupdate.
acme-dns is a DNS/REST server designed for similar situations. (I don't think it's ideal for a one-to-many situation like this. It's great for 1 hostname with 1 client, or a million hostnames with a million clients, but for 1 hostname and many clients? I think you'd have to copy and paste 1 API key to all the clients, and rotate it if you ever had to disable someone.)
It's a bit gross, but you can have each server redirect to the next server sequentially. Server A redirects to Server B, Server B redirects to Server C, Server C redirects to Server A. But i don't know how many redirects Let's Encrypt is willing to follow, and you don't want to get Googlebot stuck in a loop.