Still having issues, but you were right the symbolic links weren't being copied over and I had to TAR and then un-TAR the files to get the symbolic links to copy over too.
Sorry I missed this thread. This is one of the setups that I am very familiar with.
I just want to go over a few things:
1- Do not use tar
to copy/sync the folders. tar
- with symlink support - is the correct tool for offline backups (encrypt it & toss it in the cloud, etc). You are overcomplicating things by using it for this situation.
2- The best thing to use is rsync
as suggested by @aarongable. That's going to allow you to copy over only the changes as needed, and everything is automated via ssh. Here is one of the simpler guides: How To Copy Files With Rsync Over SSH | DigitalOcean
3- I would actually not use a cronjob for this. Instead I would use a --deploy-hook
on Server1 to invoke the rsync. --deploy-hook
only runs on success, so you're only running it when the certificate changes.
4- Server2 needs to restart to reflect the changes. That can either be done via the --deploy-hook
on Server1 that invokes the rsync command OR via a daily cronjob on the Server2.
Personally, I really like using the Python library Fabric to write these automation scripts. I find Fabric to be very fast to write these scripts with, and the Python is exceptionally clear to read and understand when it comes to maintenance. it's just a few lines to write a cron script in Fabric that will use SSH to rsync and open a shell in the other server to restart it.
I don't foresee any availability issues from running this on --deploy-hook
. The certs should have at least one month left on their lifespan when a failure occur, so the last cert should still be valid even if failure happens mid-renewal. Even when the short-avail certs are offered, it will be a matter of days leftover. The way certificate revocation works, there is around 7-10 days before most browsers will get that info – unless the browsers consider you a a high-priority site and use their private channels to push the revocation info. Realistically, there should be no issue on failover, and you should have a minimum of 2 days to triage (10 day certs) and 30 days to triage with current 90 day certs.
Something you can also do in these situations is to use a daily cron on Server2 to check uptime on Server1 and toggle a semaphore on disk. This is popular on nginx due to how it implements filesystem checks, but I'm not sure if Apache users do it much. Basically you just touch/delete a filepath, and the server checks to see if the filepath exists and changes behavior. nginx implements this in a way that uses kernel memory, so there is almost no overhead, resulting in it being very popular to have a semaphore check on every request to toggle maintenance mode or inject a service degredation message into html pages.