AWS - Distributing Current Certbot Certificates Amongst Server which are Spun Up with AutoScale

I have a web server in AWS behind an autoscaling group. The certs are baked into the image for the autoscaling group currently. Is this what I should be doing or should I have the certs off in a shared location that all the webservers can hit?

The issue I’m wondering about is this. If the certs get updated by certbot, and a new server comes up based on the image with old certs, what happens? Should I be doing a force cert update on each domain in the launch config of the autoscaling group? Should I just let my renew cron job handle this?

Just not sure the best way to handle new servers coming up with the older certs.

Hi @jwalsh,

Are you envisioning that this will happen while the old certs are still valid, or after they’re expired?

Certificates can have overlapping validity and there’s typically no problem with having different servers or instances using different valid certificates for the same domain name at the same time. (There are browser extensions like Cert Patrol that can generate a user warning under this circumstance, but they’ve never been never widely popular.)

It’s important not to have many separate instances all independently trying to renew, not because of any certificate validity issue, but only because multiple independent renewals will be likely to hit Let’s Encrypt rate limits:

https://letsencrypt.org/docs/rate-limits/

However, the way of avoiding this is totally up to you.

Thanks for the reply schoen!

I suppose it may happen during the overlap period, so it shouldn’t be an issue there. It does make me wonder how works if the server comes online with a bad cert. Does certbot check the dates on the local certs when it does it’s checking, or does it have a record of expiration dates up on it’s end?

I’m not quite sure what you meant by “up on its end”… it is looking at the files in /etc/letsencrypt and parsing them to see when their expiration dates are. The expiration date of a certificate is built into the certificate as an X.509 data field, so you can see it by reading the file.

If, when certbot renew is run, the current certificate for a particular certificate lineage within /etc/letsencrypt is going to expire in less than 30 days from now, Certbot notices this and starts the process of trying to renew it. However, it doesn’t know anything at all about any other certificates for the same domain names that might have been issued, for example, on other machines, only about what’s in its local /etc/letsencrypt directory.

When you have a small, fixed number of instances (say, 2-3), it can be reasonable to have each instance obtain certificates independently (though you will have to use the DNS challenge, since offering an HTTP challenge response on one instance wouldn’t necessarily make it available on all instances).

However, for larger groups of servers, or for autoscaling groups, I would recommend having a single machine that is responsible for:

  • periodic issuance
  • pushing the new certificate to all instances
  • telling the web servers to reload
  • updating the image, OR updating shared storage from which a new instances pull the certificate and key

This can be a separate instance from the rest, or if you want to save on costs, you can choose a single instance on which it will run (since it only needs to run Certbot a couple of times per day). Bear in mind that you will probably need to use the DNS challenge, and so whichever instance runs the renewal will need credentials to update your DNS; that can be a risk if the instance gets hacked.

Some people in this situation have also had good luck with the HTTP-01 challenge if all of the instances speak HTTP on port 80, all of the instances are publicly reachable, and there is some way for the outside world to distinguish them by IP address. You can have a validation server with a distinctive name and tell all of the other instances to redirect http://www.example.com/.well-known/acme-challenge/ to http://validation-server.example.com/.well-known/acme-challenge/. Then the validation server can use the HTTP-01 method to obtain certificates for www.example.com, even though that name can refer to many different servers.

Here’s a thought…

What if certbot could be provided a way to discern a specific “cluster” member.
Say “www.site.com” has:
server1 with IP 1.1.1.1
server2 with IP 1.1.1.2
server3 with IP 1.1.1.3
And
www1.site.com = IP 1.1.1.1
www2.site.com = IP 1.1.1.2
www3.site.com = IP 1.1.1.3

renewing a cert for “www.site.com” has 66% chance of failure.
renewing a cert for “www3.site.com” has 0% chance of failure (but won’t match www requests)
renewing a cert for (“www3.site.com”+“www.site.com”) should be a valid request (for server3) with 0% failure.
If certbot could detect that IP3 is within www.site.com list of IPs (1,2,3), it could just go there first for subsequent auths.

enters the concept of:
–preferred-IP 1.1.1.3

which could reduce the cert request to just the “common” name.
But I still like the unique+common cert method; yes, its more certs (not shared) but that can be useful like easily determining which server is handling connections and such.

@rg305, this would require server-side changes to the CA validation methods (and maybe the ACME protocol); I think it’s unlikely because my impression is that the server-side folks want to increase the unpredictability of validations, not decrease it.

Ok, appreciate the time @schoen! While I may have been unclear in my whole situation what you explained more than works out for me!

Thanks again for the replies and taking the time!

Hi @jwalsh

This is how I would deal with this

A) Have a dedicated (small instance) server for TLS Related Functions (getting certs, storing certs, etc)
B) When you scale up include a start up script which logs on to the server (A) grabs the latest certificates and adds them to relevant directories
C) Then restart Apache/Nginx

Other tools like Chef/Ansible/Puppet can also be used to configure this.

Essentially your challenge is Configuration Management

Andrei

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.