What will happen to Must-Staple

while it'd be a challenge I think LE always been want to have few days long certificates:
like from 2022 https://community.letsencrypt.org/t/shorter-certificate-lifetimes/174142/6

Yep, we still consider this regularly. We have the capacity to issue 3x (or even more) as many certificates as we do today without running into fundamental limits of our infrastructure, and we're working on big infra changes to let us shoot past a billion simultaneous active certs. And issuing more shorter-lived certs isn't even as hard as just issuing more certs, since we can move data about expired certificates out of active databases and just leave it in records archives.

But. The shorter certificate lifetimes are, the more critical every second of an outage is. Today, if we have to stop issuance for 24 hours, that's still 99% uptime over the lifetime of a cert. If we're issuing 30-day certs, that's down to 96.5% uptime. If we're issuing 7-day certs, a whole day of downtime represents missing fully a third of our certificate re-issuance, given that folks would likely be attempting renewal halfway through the life of their certificate.

So in a way it really is a personnel issue -- we believe in making sure our engineers are happy, healthy, and not stressed out, even when we're on call. Shortening our certificate lifetimes would increase pressure on us to resolve outages quickly rather than correctly, and that's not something we're interested in doing right now.

5 Likes

That post exemplifies some of the real problems with achieving shorter certificate lifetimes. The same stress on users, regardless of the idealistic virtues of "set and forget", will also apply. This is a situation where the ideal doesn't mesh with practical reality or human nature, much like certain social philosophies.

6 Likes

I think there are two issues here:

  • What should the ACME Server do?
  • What should the ACME Client do?

I do not think the ACME Server should ignore a "must staple" request, and should fail/reject instead, because this configuration option is an explicit security decision with an expected intentional behavior. If one configures must-staple and there is no OCSP response, the expected behavior is a validation check failure to the client. Ignoring the extension would break this security decision, and that is a more important "promise" to keep than backwards compatibility. While LE might no longer offer OCSP stapling, it is still a widely accepted and implemented industry standard that other CAs would be offering at the time. The only scenario I could imagine a "just ignore this extension" approach being acceptable, is if the CA/B forum officially deprecated the extension and required it to no longer be supported.

In terms of ACME Clients, this shifts the burden of "what to do?" on to them. While I understand LE wants to ensure the best experience for all clients, every client has a different UX/UI and userbase. Their maintainers are the best persons positioned to decide how to migrate their own userbase.

My suggestion is to support a "should-staple" extension to the ACME Order (or elsewhere) and try to roll that out to major clients ASAP. The purpose is to allow clients to mark the order as "support OCSP while you still offer it" or "hard fail if you don't". Clients would still need to populate the must-staple field on the CSR if they want it, this just handles the failure and lets people transition out in a planned manner (currently a 9-18 month timeline).

This strategy would allow clients (e.g. Certbot) to determine the best migration strategy for their own users if no change is taken. It would also let updated clients notify their users to take action and update the renewal configuration. Legacy clients would not be able to notify their users, but a hard fail to force a conscious user decision is the appropriate action here.

3 Likes

Another of those problems is that many clients, including certbot and acme.sh, schedule renewal based on a set number of days. Caddy is a notable exception, but certbot renews a cert when it has 30 days' validity remaining, and acme.sh 60 days after issuance. Both of these are configurable, of course, but that configuration would have to be done.

7 Likes

While I like the gist of your direction, pushing this off onto the clients is a one-to-multitudes burden that could have far-reaching consequences. While I know change is inevitable, as an ACME client developer myself I start to get rather worried when I read about "new requirements" that could make my users' experiences more complex, especially with a sense of urgency attached. Having implemented the use of Let's Encrypt certificates in critical infrastructure in a professional setting (utilizing cert-manager), I can safely say as someone who obviously has intimate familiarity with ACME and this community that such tasks are still not without their challenges, which makes me really feel for those in challenging environments without such perspective. I feel that it's easy for us to suggest improvements here from our positions of knowledge, but perhaps miss the full impact upon those not present.

5 Likes

I understand that concern, but I don't think there exists a "global" way to handle this deprecation properly other than a failure to issue. IMHO, the best way to inconvenience users the least is for Clients to offer a transition strategy.

Consider a use-case with the Certbot Client and a certificate queued for renewal with the --must-staple flag. The Subscriber has explicitly opted-in to the "must-staple" extension, and expects their certificate to be procured with that extension – an intention that might be caused by simple intent, an internal security policy, or a third-party certification/auditing requirement.

If the ACME Client and ACME Server suddenly decide to ignore the must-staple extension to prioritize seamless renewal over a Subscriber's explicit security configuration, the Subscriber would be receiving a Certificate that might be incompatible with their internal business requirements or contractual obligations. This is a "big f*ing deal". Keep in mind- the extension itself is not deprecated, nor is this a global change wherein no CA supports it any longer – the extension is simply deprecated from a single CA's Issuance API. The only acceptable action is for the ACME Order to fail, and allow the Subscriber to determine if they must continue supporting it through another CA, or if they can rely on the forward-thinking behavior of LetsEncrypt and drop OSCP support. The lack of OCSP support and an OCSP responder will also mean that Subscribers will need to reconfigure their webservers (or cloud dashboards) to drop the OCSP configuration. Simply issuing the Certificate without an OSCP responder is likely to break servers.

So our situation is roughly the following:

1- LetsEncrypt will drop OCSP support in the future, with a timeline TBD
2- Serving a Certificate expected to be must-staple may break a Subscriber's internal policies and third-party auditing/certification requirements
3- Restarting/Reloading a Service with a must-staple Certificate may break the service

API changes and deprecations are a massive pain, but I don't see how LetsEncrypt could just ignore a must-staple request given the context and character of this extension. This really needs to be a breaking change.

If this is a breaking change, then it falls on Client developers to help Subscribers migrate from this configuration – or to another CA – BEFORE the change goes live into production.

In the simplest scenario, given this forthcoming change, Clients should start to WARN users or even FAIL requests and no longer support it.

The annoying bit about this though, is that OCSP still has massive utility and the replacement technology/ecosystem isn't quite there yet. Perhaps this is being too subjective, but (within the context of ISRG's motivation and rationale): while the Cons of OCSP will soon outweigh the Pros... the Pros are currently outweighing the Cons. Given this situation, it makes sense to try and offer a seamless transition path so Subscribes can enjoy OCSP until it is deprecated. A seamless transition path would also address the scenario where ISRG decides to delay the (TBD) deprecation for an indeterminate amount of time.

Expanding on my suggestion above, I think LetsEncrypt could use the account emails to periodically notify Subscribers who utilize must-staple the extension is being deprecated, and provide them with a recommended transition strategy.

To be clear: I am incredibly unhappy with this decision, and think it is going to cause a major headache for Clients Maintainers and Subscribers alike because I adamantly believe a request for must staple MUST be rejected and not silently ignored. I think Subscribers running legacy clients are going to be massively impacted, and I fear automated CI systems are going to be hit hard by this too. While I understand (and actually agree with) ISRG's concerns and motivations for this deprecation, this sort of change is something that should be done in a 3+ year plan, not a 9-18 month one.

11 Likes

I agree with and echo pretty much all of this.

(Though I still think the privacy argument against OCSP is weak as it's a client concern, not a CA concern.)

5 Likes

Which clients are aware of CA features and limitations and can advise users accordingly? Certify The Web does have per CA feature flags that it can use to make certain decisions but I think that's fairly uncommon (?).

The biggest problem on the client side is that users don't update their client regularly, and ACME doesn't dynamically advertise a CA feature list, so reacting to CA changes proactively/dynamically on the client is hard. Obviously we can blame the users for not updating their software, and for not updating their settings, but blame doesn't solve the problem.

8 Likes

The ecosystem is already experiencing ripple effects because of this. Looks like Go will stop maintaining OCSP code and will not implement an OCSP verifier:

2 Likes

Not sure if you can say that from just a single reply? Hopefully Go isn't managed by the whim of a single developer.

1 Like

Well, that reply is from the (I believe) golang security team lead (since FiloSottile left), so it does carry a lot of weight. The go security people are the ones who are ultimately responsible for the safety of go's code and have to maintain every single line of code there*. So even if someone else puts in the work of doing a PR, if the proposal doesn't get accepted (the security team doesn't want to maintain it), it won't get merged.

That said, the whole reason the go project has these proposals is so that some folks can convince other folks (in this case, the maintainers) of something. You are allowed to argue against them and maybe change their minds with good arguments.

*there, as in "all cryptography code".

5 Likes

Wouldn't one of the solutions here be not responding to OCSP requests for certificates that are not requiring Must-Staple? Essentially forbidding it for the clients alone? This would allow "trialing" shorter/faster revocation before lifetimes actually get reduced to that extent. The last few lifetime reduction CA/B Forum ballots have barely passed, isn't this jumping the gun a bit?

I also really don't know any common clients actually using OCSP themselves. I've blocked all OCSP in my local network and there have been a handful of requests in total. If this is an issue in practice, wouldn't it also possible to deliver OCSP responses over HTTPS?

1 Like

This is the “continue to run OCSP for certificates which have must-staple” plan, an option we are considering.

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.