New Issuance Chains on Staging Failing

imfranklin · May 23, 2024, 7:19pm

In testing against the new intermediate certificates on staging we are no longer able to issue or renew certificates at via our lua-resty-auto-ssl based system.

We are seeing the same error as when the now abandoned enforcing of async orders was in place:

err: ERROR: Problem connecting to server (post for ; curl returned with 3)

The same code, domains, etc. all function and receive certificates as expected when using Let's Encrypt production environment.

We are unable to simply update to the dehydrated ACME client that supports async orders at this time — is staging currently enforcing async orders?

Bruce5051 · May 23, 2024, 7:21pm

Hi @imfranklin,

Since this is recent please read these:

imfranklin · May 23, 2024, 7:22pm

We are having no issues with production.

Bruce5051 · May 23, 2024, 7:25pm

Kindly wait for more knowledgeable Let's Encrypt community volunteers to assist.

mcpherrinm · May 23, 2024, 7:34pm

Yes, staging still has async finalization turned on.

imfranklin · May 23, 2024, 7:51pm

Do you mean "on" as in "available" or as in "forcefully required"?

Our understanding is it is available but not mandatory in production. If it is mandatory in staging, then we have no means at all to test certificate renewals against the upcoming June 6 changes.

orangepizza · May 23, 2024, 7:53pm

It's about what servers do so if server does async and return 'processing' client need to understand it isn't fail and check later for actual result

mcpherrinm · May 23, 2024, 7:59pm

There's just "on" or "off", there's no "available but not required" in Boulder, at least today.

Finalization is mandatory in staging, to use the terminology you're using.

We may require async finalization in some cases in the future in production, but a point further in the future once more clients support it properly.

imfranklin · May 23, 2024, 8:10pm

Whereas the issuance change has a concrete rollout date within two weeks, and async finalization is a "maybe, for some things, someday", could staging be made to now reflect how production will function on June 6?

mcpherrinm · May 23, 2024, 8:15pm

We'll discuss that request as a team.

mcpherrinm · May 23, 2024, 8:21pm

Could you run lua-resty-auto-ssl with this patch: Bump dehydrated to v0.7.1 by cgunther · Pull Request #291 · auto-ssl/lua-resty-auto-ssl · GitHub

That upgrades the version of the underlying dehydrated library to something that supports async finalize.

It seems lua-resty-auto-ssl hasn't recieved even minor maintence updates in 3 years. I am not sure we are going to go out of our way to support seemingly abandoned projects.

imfranklin · May 23, 2024, 8:41pm

We had tested against the updated dehydrated during the attempted rollout of async finalization, and it was unfortunately not successful.

Our system is lua-resty-auto-ssl based, and heavily customized. We are not seeking support for either, though, only a stable testing environment that reflects the existing production environment.

We have a greatly increased rate limit from Let's Encrypt for the amount of certificates we handle each day, and any failures there would mean a fair amount of disappointed people. Should there be any issues with the June 6 issuance changes we would like to commit our resources to addressing those now, and not possible-maybe-someday-async features at this time.

aarongable · May 23, 2024, 8:45pm

I understand the desire for a testing environment that reflects the current production environment, but we need to recognize that the staging environment is in a constant state of compromise, trying to balance many different testing needs. In some cases we want it to reflect prod as it is today, and in others we want it to reflect prod as it will be in the future.

In this case, it has revealed that your client does not support asynchronous finalization, a thing that it really should support, and which it may need to support if and when we turn async finalization back on in prod. While I understand the frustration at not being able to test the new chains in staging with your current client, this means that having async finalization required in staging is working as intended. Please use this as an opportunity to change or upgrade your client to one which does support async finalization.

imfranklin · May 23, 2024, 9:38pm

Without a testing ground supporting synchronous calls, Let's Encrypt has effectively gone async-only. We could respect that were it reflected in documentation or clearly communicated elsewhere, but as-is you are correct: it is frustrating.

We'll accept this conversation and the current configuration of the testing environment as a form of communicating that async-only is looming...again. Regardless, it has been on our roadmap to migrate to certbot and fully modernize our system, but neither are likely possible before the issuance changes are released to production.

aarongable · May 23, 2024, 10:46pm

To provide a little more context here:

We do not currently have any plans to turn on mandatory async finalization in prod in the future. I wouldn't say that "async-only is looming". However, we may: turn it on for orders with many names; or turn it on for orders that require CAA rechecking due to relying on old validations; or turn it on for any order that takes more than 500ms to finalize; or turn it on for all orders during an emergency that is causing finalization to take unexpectedly long.

Because of this, we are keeping async finalization on in Staging for much the same reasons that we put a random key-value pair in the Directory: to encourage and require client agility. Asynchronous finalization has always been part of the RFC 8555 ACME specification; clients that do not implement it have always been time bombs lying in wait to cause issues for their operators. We ran smack into those problems when we first attempted to turn on async finalization; we are not willing to let clients ossify further.

Again, I'm sorry that this makes testing against staging difficult for your particular case. We are a small organization, and we cannot dedicate time and energy to supporting broken clients that have not had active development in over three years.

imfranklin · May 24, 2024, 1:45am

Hey, I'm apart of a small team, too! SPOILER: it's how we got married to lua-resty-auto-ssl years prior to its stagnation, and are yet to replace it because it still works (and from our perspective, smoothly, too) — we don't have extra time to go around fixing things that aren't broken and/or labeled as deprecated.

We appreciate the extra context, sincerely. The lack of other definitions, however, makes it difficult for a small team to allocate precious man hours, which I trust you understand. If sync calls are end-of-life, just say it. For real, this time (we know you did once and it didn't go well...but maybe there's more to that...). Hard dates and clear announcements can be planned around. Tidbits of info here and there throughout a forum with wishy-washy/contradictory requirements, not so much.

You say "broken client", but it's clearly functioning in production and not labeled as unsupported anywhere. In fact, it's even linked to as a client option, and has been for years.

So, we're trying to stay up-to-date on the API announcements, follow the guidelines, test for the upcoming changes...and we can't. I guess we're out of luck, but don't tell us it's because we're using broken things.

Bruce5051 · May 24, 2024, 1:55am

Hi @imfranklin,

From here https://letsencrypt.org/docs/client-options/#other-client-options
"... in which case we recommend contacting the project maintainers or switching to another client."

You might also consider opening an Issue with Issues · auto-ssl/lua-resty-auto-ssl · GitHub

petercooperjr · May 24, 2024, 2:34am

Maybe we could help with some kind of workaround if we understood what exactly it is with the upcoming changes that you're trying to test? The intermediates on staging and production are different anyway (since the staging ones aren't trusted, of course). And it sounds like you haven't run your system against staging since before they turned on async finalization which was quite a long time ago now, so I don't think that you're testing how your system deals with an intermediate change. So, what exactly are you trying to test?

webprofusion · May 24, 2024, 5:01am

Unfortunately ACME doesn't have client compatibility modes, so clients have to stay up to date or they will break over time as services mature (hopefully within the confines of the rfc8555 spec, but that's not guaranteed by anyone). All ACME clients have seen this over time.

ZeroSSL etc have had async (i.e. slow) finalization for a long time and most steps in the ACME process require polling for status changes rather than assuming orders will move to the next expected status immediately.

As a general rule if you find your client has become incompatible with staging you can bet money you'll eventually be incompatible with production. If you're lucky the problem will affect many clients and therefore be big enough that it gets pushed back for a period of time, but if not then your client has to adapt or become obsolete.

Ironically there is just no set and forget when it comes to ACME services and client maintenance is the cost of entry.

Osiris · May 24, 2024, 6:33am

It's not conforming to RFC 8555. Thus strictly speaking does not speak the entire "ACME" language and could therefore be considered (at least partly) broken.

Note that some free ACME CAs don't even have a staging provider. Another possibility might be to run your own Boulder instance. Although I'm not sure how well that's documented.

Best bet is to fork the ACME client used and modify it yourself if development has indeed stopped on the original client.

Topic		Replies	Views
myVesta / HestiaCP / VestaCP: fail issuance with async finalization Client dev	11	1309	May 6, 2023
Enabling Asynchronous Order Finalization API Announcements	7	5751	April 6, 2023
How one can make staging brownout more useful? Client dev	8	827	May 12, 2023
New staging certs Help	14	1771	July 11, 2024
CSR result never seems to exit 'processing' status in Staging Client dev	10	100	May 1, 2025

New Issuance Chains on Staging Failing

Related topics