(STAGING) Doctored Durian Root CA X3 is expired (breaks test environment)

The Staging environment is just that -- an environment in which we can stage upcoming changes, and get some amount of real-ish traffic to confirm whether or not those changes break anything.

It has no SLAs, and no performance, uptime, or data integrity guarantees. It exists for the explicit purpose of breaking things, so that we don't accidentally break things in prod. Anyone whose infrastructure relies on staging operating exactly like prod is working under false pretenses.

Obviously, there's a balancing act here. If we make staging so unreliable that no one can accomplish anything useful with it, then we'll cease getting any traffic and staging will no longer serve its purpose. So we get paged when staging is down and try to fix it as quickly as we can, we post announcements weeks or months in advance that staging will be changing at some point in the future, and we post announcements again when changes are actually made to staging.

In this particular case, deploying the new certificates on staging was part and parcel with a much larger (but otherwise invisible) change to how staging operates as a whole. We didn't know how long that change would take, and when it was finally ready we made the call to proceed with it ASAP in order to unblock other changes (like beginning ECDSA issuance in staging!).

In engineering, there are always tradeoffs to be made. In light of the unreliable-by-design nature of staging, in this case we decided to move quickly at the expense of having staging announcements up days or weeks in advance. It's perfectly reasonable to believe that that wasn't the right tradeoff to make, and we'll keep an eye on how much community support has to happen (and try to weigh that against how much community support would have had to happen even with announcements), and see if we want to change our calculus in the future.

14 Likes