I’m reading the integration guide, and there is lots of concern over when to renew certificates to prevent traffic spikes. To make this more efficient, I suggest that there should be a Let’s Encrypt API that will return the suggested renewal date/time for a certificate.
For each renewal and certificate issuance (and possibly a dedicated endpoint), Let’s Encrypt would return a Unix timestamp for when the client should schedule the next renewal request. LE would evaluate current/predicted traffic patterns and would pick a good future time to renew (approx. 60 days).
To prevent Let’s Encrypt from doing unnecessary computation, they could reuse the existing timestamps for a certain number of certificates (e.g. every 1k certs, recompute the suggested renewal date).
As an example, client programs could store this suggested renewal time locally. Then cron jobs could hit the client program periodically (every minute or so) and it would conditionally renew the certificate, possibly delaying until the exact time occurred. More advanced clients (hosting providers with lots of certificates) would store this time in a database.
I don’t think the question was about rate limits, but rather in relation to the notion of setting renewal crons at random times to avoid everyone hammering LE at noon and midnight, for example.
Ah, I’m not sure enough people would actually use such an endpoint to make it worth it.
At the moment, our only guidance is to ensure that your cron job not fire immediately on an hour or minute boundary. Add a randomized delay or make sure the second and minute fields in your cronjob are something besides 0 and you should be good.
Correct. As a community, the idea is to reduce the amount of traffic spikes Let’s Encrypt receives.
Currently, I just have anacron set to renew every month. If everybody does that (which is likely), LE is probably getting a huge amount of traffic on the first of the month at midnight.
For those that DO try to be more random (and thus, have more advanced setups), LE could tell them a specific time to renew that is away from those traffic spikes caused by less-helpful users.
If everyone determined their renewal times randomly, this wouldn’t be an issue. But I imagine LE does get traffic spikes. My idea is to have those that have advanced setups use a time that Let’s Encrypt wants them to, thus not increasing the intensity of the traffic spike.
First, your setup is definitely not best practice - the recommended setup is to run certbot renew twice daily, since it will only renew if the certificate is within 30 days of expiration. This gives you a comfortable 60 attempts that would have to fail, whereas yours gives you one, maybe. Unless, of course, you set up with --force-renewal or --renew-by-default, which is also not recommended.
Second, I don’t think this is of big enough concern to Let’s Encrypt to invest the time and resources it would take to create. As long as enough people spread out their renewal timing, it won’t be too bad. Certbot randomizes this when it installs cron jobs anyway.
It might make more sense to compute the recommended cronjob installation based on the account key. This could then be a simple algorithm for clients to implement.
For example... hash the account key for even distribution (or is it a hash already? i forget) then map certain slices of the hash to hh:mm:ss
if multiple account keys are available, just use the first.
hashes tend to have a more random distribution than random. More importantly, people often implement random incorrectly in scripting languages (not correctly seeded, etc) which leads to random not really being random.
While I agree with your observation and think this could be a valid way of choosing when to attempt renewals, the core goal on Let’s Encrypt’s part here is to avoid spikes that are caused by large numbers of users systematically choosing the exact same moment to renew. In this case poor or improperly seeded RNGs can’t be expected to contribute much to this problem, because it mainly exists due to people hardcoding policies “renew at midnight”. Any attempt to move away from the default or most common times would help alleviate pressure and spikes, even if the variation is chosen on a basis that isn’t strongly random and that exhibits a skew for some reason.
For example, even if people simply choose a renewal time off the top of their heads, that will help even though the resulting distribution will be biased and probably include a lot of times closer to the beginning of the hour, because it will still spread the overall distribution away from one or two single times. Similarly, even if a particular client on a particular OS always for some reason chose the same minute past the hour, it would be unfortunate, but it would already be far better than choosing the exact start of the hour.
Everyone’s efforts to better-randomize renewal times are welcome, but as far as I know nobody’s efforts so far have been poor enough to be noticed and complained about on the CA side.
A simpler option would be for certbot to check the validity time of the current certificate, and not even try to renew if it has more than 32 days until expiration, and then for --renew to just sleep() for some random number of minutes on execution if the shell isn’t interactive (which it almost never is for cron, unless someone adjusts it for some weird reason). The most common interactive test that’s reasonably cross-shell compatible:
case "$-" in
*i*)
interactive=1
;;
*)
not_interactive=1
;;
esac
I’m not entirely sure how you do it in Python, but I’d imagine there’s a similar way to test for presence of STDIN/STDOUT.
Certbot already does this - it won't even attempt renewal if the certificate has more then 30 days of validity left. It also already handles randomizing renewal times when it adds itself to cron.
I think this may be a lot of trying to fix what isn't broken here.
In that case, the only thing left to worry about is manual installs or badly-behaved third-party tools. I guess it all depends, are they numerous enough to matter to LE.org, or a drop in the bucket?
Neither of those two cases seem likely to ignore the easy solutions and use an API or some other method (that’s also significantly more difficult than guessing a random time) to determine when to renew.
This is my idea. If there is a sufficient number of these “manual installs or badly-behaved third-party tools”, good behaved tools could ensure they don’t pick times that are around those of the bad tools.
If there are 10 parking spots, what law abiding citizens should be doing is picking a spot completely at random. But it is likely there are a certain subset of people that don’t care and always go for the first spot. (Therefore the first spot has a traffic spike.) The solution would be for law abiding citizens to call in ahead-of-time to figure out where they should park. The parking agency could then evaluate what the bad drivers are doing, and advise the good drivers appropriately.