How one can make staging brownout more useful?

orangepizza · April 11, 2023, 4:31am

looking at resent API event (async order finalize, deprecate ACMEv1), even permanently enabling a feature does not make any notice but as soon as it's enabled in Production we got a hail of threads here that says '$Feature broken out client, help me get certificate' while production brownout catches before final release, it's a disruption
as a starting idea, if we reduce stageing cert lifetime to 10 days standard 2/3 renewal client pointed to stageing will renew cert against every boulder release, and detect any breakage on any week or longer staging brownout. Force-renewing by cron every week works without changing acme server, but authz reuse may client not visiting full cert requesting path.

bruncsak · April 11, 2023, 5:25am

It is a correct requirement. These days as a workaround I create test certificates with uniqe names (timestamp in the domain name).

webprofusion · April 11, 2023, 7:28am

A good test of any client is to simply use ZeroSSL as they take much longer to complete authz and to finalize and they also don't seem to cache authz anymore (I think they used to).

I didn't really notice the threads where clients broke but it would be interesting to see if they've been fixed since then and try again in 6 months, the reality however is that if someones acme v2 client worked 2 yrs ago then it's probably not been updated since then and only a production breakage will change that. Some people update their software, some really don't. I know there are people using versions of my app from 5yrs ago.

futureweb · April 11, 2023, 12:24pm

Hi,

I wanted to provide a brief assessment as to why, in my opinion, there were so few issues in the tests on Staging (see Enabling Asynchronous Order Finalization - #5 by aarongable - ""This has been (quietly) enabled continuously in Staging for a week now. We have not seen any increase in errors, nor have we received any complaints or bug reports from clients issuing against Staging.""), but there were so many errors during the live rollout.

The reason is probably that such breaking changes mainly cause problems for old clients - and old clients are usually running on older systems and quietly doing their job against the LE Live system. There are probably very few older clients running against Staging, why would they ... mainly this happens when implementing new LE clients on new systems, etc.

Just a little food for thought!

Best regards
Andy

aarongable · April 11, 2023, 4:18pm

Yep, that's largely what we observe. There's a strong correlation between "clients that test against Staging" and "clients that flexibly support all permissible ACME Server behaviors and update quickly when they don't".

jvanasco · April 11, 2023, 6:02pm

I've said it before, and I'll say it again. ~~Life moves pretty fast. If you don't stop and look around once in a while, you could miss it.~~ The most beneficial thing, IMHO, would be for ISRG - or another group leading ACME work - to develop a framework or checklist of official test-case scenarios that client authors should integrate against. Clients could then be promoted based on their compliance to these tests. That checklist could even be part of the Pebble release, as testing against that project is a great first-step before hitting staging.

webprofusion · April 12, 2023, 1:43am

Perhaps the checklist could be versioned by date e.g. Pebble ACME Compliance Suite 2023.09 so a client could declare itself compliant against Pebble ACME Compliance Suite 2023.04 so we'd know where it got up to. It's possible you'd need a Core and Full checklist because clients can be fully working for certs without implementing things like account key rollover etc. There's also the the topic of clients which support ACME extensions (ARI, Authority Tokens etc).

As an aside I'd be happy to expand acmeclients.com to be a list of any and all acme clients and their stated compliance levels. It would be up to the clients to submit updates - we could split this into a file per client. The point being that unless we can track which clients are at what compliance level it's difficult to know which clients are affected by a change.

jvanasco · April 12, 2023, 3:01pm

Wonderful idea. Everything in your response, wonderful ideas.

system · May 12, 2023, 3:02pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enabling Asynchronous Order Finalization API Announcements	7	5684	April 6, 2023
New Issuance Chains on Staging Failing Help	24	531	June 23, 2024
Stability and Purpose of Staging API Announcements	0	1948	February 23, 2021
ACME v2 API Staging Availability Issuance Tech	14	3774	February 8, 2018
CSR result never seems to exit 'processing' status in Staging Client dev	9	59	April 1, 2025

How one can make staging brownout more useful?

Related topics