I'm going to build a 4 node cluster of Raspberry Pi's with HA. Several services will run on this that require a certificate such as Dovecot, Postfix and Apache. However, these services will float around on these machines and Apache might not be running on the same node where Postfix/Dovecot runs.
So in order to host the webroot on every node, there has to be a web server for the ACME challenge. I'm thinking of a few ways to solve this.
The HA software I'll be running is Pacemaker. I could probably set up a rule that if Postfix/Dovecot runs on a machine, but not Nextcloud or other services that make use of Apache, a "one shot" Apache server will be spawned with just the webroot. If for whatever reason Nextcloud or any other service is moved to that machine, the "one shot" Apache is killed and the regular one is started, which by default includes the webroot. But this might be tricky, dunno yet. Pacemaker works best if kept simple.
Another method is to have a network share available on all nodes with the same set of certificates and just one certbot service that updates this. But that is tricky too. What if this share has issues, then the certificates are not available. And you also have to make sure some mount this share as read-only and only the primary node has certbot running. DRBD comes to mind with multi-mount enabled. GlusterFS is used for all other services, which (as far as I know) doesn't have multi-mount features.
I'll probably find a good solution. But I'm just dropping this here so the community may provide me with some advice and experiences about what worked and what didn't.
HA == High Availability or HA == Home Assistant?
Also, I'm preeeetty sure you probably seem to have more knowledge in this matter than generally available on the Community, but perhaps I'm underestimating my fellow volunteers and/or perhaps we can chip in some "2 cents" anyway.
Personally, I like the idea of DRDB. Have a single, primary Pi set up Certbot with all the certificate stuff and replicate the Certificate "store" to the secondary Pi's using DRDB. As the certificates have a lifetime of 90 days and Let's Encrypt recommends to renew the certificate after 60 days, you'd have 30 days left to fix any issue with the primary Pi. The secondary Pi's would be using their replicated certificate store in the mean time. I'm sure 30 days is enough to fix any major issue regarding certificate issuance? If you have a good backup strategy of your Pi nodes, you even would have enough time to buy a new Pi and restore a backup, if necessary
Also, a simple scripted and secure
rsync from a primary Pi to secondary Pi's would also suffice instead of using fancy systems like DRDB.
Hi, thanks for your reply. HA is indeed High Availability in this context. Rsync is also a possibility. Then the Apache group in Pacemaker has certbot as a dependency and uses a GlusterFS volume to store the data. And then a cron will rsync from the primary floating certbot host the data to all nodes on the local disk. If the service will move to another host, then GlusterFS will mount the certificates on top of the local disk directory. And then the cron can run on all nodes, where a simple check is done if that node is currently the certbot primary node, if so, skip rsync. The other nodes will rsync then by using the DNS that points to the floating IP of the certbot node.
I think that's it. But it will take some time before I get there. A lot to configure still. But this seems like a simple and robust solution. Thanks for mentioning rsync!
Edit: I can also use a certbot hook to trigger the rsync and then also restart the Pacemaker services from there so that they all have the latest certificate in use. In combination with a daily rsync of course. But at least things are orchestrated and kept in sync with that hook. So in summary, the daily crons are pulling the certificates from the floating certbot host. And an rsync push is triggered when the floating certbot host renews certificates and will then also execute a restart of services that use these certs via Pacemaker.
Sounds like a solid plan!
Of course you'd need some kind of signaling build in that triggers if for some reason renewal didn't succeed or a rsync pull or push didn't succeed et cetera.
True, I might script it and send an email if something is wrong. Or I might configure Zabbix to trigger something. It will keep me busy this year I think.
And also develop some kind of monitor that monitors the life time of the certificates presented on your site. If it drops below 30 days before the expiry date, it should renew. If it doesn't and your certs are e.g. 25 days before expiry, for some reason the renewal is failing for 5 days already. If you didn't get an email from your other scripts by that time, something with those scripts is wrong.
Or, use DNS validation for the acme challenges, that way it doesn't matter which server is the web server as long as your renewals copy the updated cert to the right place.
Hm, yes, I missed the part of handeling the challenge. I was focussing on handeling the distribution of the issued certs among the services using it.
dns-01 challenge is a possibility, but it shouldn't be too hard to have only the primary node handle the
http-01 challenge, right? There probably is some kind of load balancer in play here and perhaps it can redirect requests for files under the path
/.well-known/acme-challenge/ to a single node? IMO every load balancer should be able to do such a thing
Uch, the documentation of Pacemaker is terrible though..
... or it can handle them itself, I think: stuff like haproxy or traefik do.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.