Background
During the deployment of our infrastructure using Spacelift, we encountered a recurring issue where our stack was getting stuck during the apply phase. This was traced back to the certificate renewal process managed by the ACME provider.
Problem Description
The root cause of the issue was identified as a conflict between two settings: min_days_remaining and use_renewal_info. The default value for min_days_remaining is set to 30 days, which inadvertently conflicted with how renewal information and timing were processed. Specifically, when both settings were active, the system attempted to renew certificates too close to their expiration dates, but then entered a sleep state until a randomly chosen renewal time within the "renewal window" arrived. This behavior effectively blocked our deployment pipeline.
Temporary Resolution
We temporarily resolved this issue by turning off use_renewal_info, allowing certificates to renew and unblocking our stack. For a more permanent fix, we adjusted the min_days_remaining parameter to 10 days, ensuring that certificate renewals are triggered earlier within a safer timeframe.
Request for Consideration
We believe this behavior may not align with expected operational logic and could affect others using similar setups. We request that Let’s Encrypt consider reviewing this interaction between min_days_remaining and use_renewal_info settings to prevent such conflicts in future deployments.
Yes, but what you describe is a feature of the ACME Client. Let's Encrypt provides the ACME Server. You should post your request to the support channel for the ACME Client you use. You didn't describe it otherwise I'd have given a link.
Ideally your ACME Client supports ARI to determine when to renew. If not today it should as it is now an RFC within ACME protocol. You should know that those renewals may not occur on a specific schedule (like CA revocation events or others). You should ensure your tooling isn't reliant on a specific schedule. These timelines will also change as the industry moves to shorter certificate lifetimes.
See: Improving Resiliency and Reliability for Let’s Encrypt with ARI - Let's Encrypt
Also: Decreasing Certificate Lifetimes to 45 Days - Let's Encrypt
5 Likes
Hi @stamatis,
Thanks for the detailed report. From the names of those config flags, it sounds like you're using the Terraform ACME client. As indicated by the use_renewal_info flag, this client does already support ARI, and you're doing exactly the right thing by using it. It's unfortunate that you ran into this interaction between these two config values, but this case is documented here. If you think the interaction between these two config values should be changed, or if you want to make the documentation of this interaction more prominent, you should file an issue here.
6 Likes
You could also ask them to implement a lifetime percentage (elapsed/remaining) based preference for renewals becoming due. If you get a cert that only lasts 7 days (or 47 days etc), having a fixed preference value for days remaining doesn't really work.
4 Likes
Thank you for the prompt reply and the linked documentation. Indeed, this lies on the ACME client side.
2 Likes
Hi @aarongable and thank you very much for your answer. Indeed, the ACME client in use is the Terraform one. Also, to calibrate on the issue right now, what I did was to set the min_days_remaining to be 10.
Thanks also for sharing the correct place to file this issue.
2 Likes
Thanks a lot for the suggestion @webprofusion. Much appreciated.
3 Likes