New Issuance Chains on Staging Failing

In testing against the new intermediate certificates on staging we are no longer able to issue or renew certificates at via our lua-resty-auto-ssl based system.

We are seeing the same error as when the now abandoned enforcing of async orders was in place:

err: ERROR: Problem connecting to server (post for ; curl returned with 3)

The same code, domains, etc. all function and receive certificates as expected when using Let's Encrypt production environment.

We are unable to simply update to the dehydrated ACME client that supports async orders at this time — is staging currently enforcing async orders?

2 Likes

Hi @imfranklin,

Since this is recent please read these:

3 Likes

We are having no issues with production.

1 Like

Kindly wait for more knowledgeable Let's Encrypt community volunteers to assist.

Yes, staging still has async finalization turned on.

6 Likes

Do you mean "on" as in "available" or as in "forcefully required"?

Our understanding is it is available but not mandatory in production. If it is mandatory in staging, then we have no means at all to test certificate renewals against the upcoming June 6 changes.

1 Like

It's about what servers do so if server does async and return 'processing' client need to understand it isn't fail and check later for actual result

4 Likes

There's just "on" or "off", there's no "available but not required" in Boulder, at least today.

Finalization is mandatory in staging, to use the terminology you're using.

We may require async finalization in some cases in the future in production, but a point further in the future once more clients support it properly.

5 Likes

Whereas the issuance change has a concrete rollout date within two weeks, and async finalization is a "maybe, for some things, someday", could staging be made to now reflect how production will function on June 6?

2 Likes

We'll discuss that request as a team.

4 Likes

Could you run lua-resty-auto-ssl with this patch: Bump dehydrated to v0.7.1 by cgunther · Pull Request #291 · auto-ssl/lua-resty-auto-ssl · GitHub

That upgrades the version of the underlying dehydrated library to something that supports async finalize.

It seems lua-resty-auto-ssl hasn't recieved even minor maintence updates in 3 years. I am not sure we are going to go out of our way to support seemingly abandoned projects.

6 Likes

We had tested against the updated dehydrated during the attempted rollout of async finalization, and it was unfortunately not successful.

Our system is lua-resty-auto-ssl based, and heavily customized. We are not seeking support for either, though, only a stable testing environment that reflects the existing production environment.

We have a greatly increased rate limit from Let's Encrypt for the amount of certificates we handle each day, and any failures there would mean a fair amount of disappointed people. Should there be any issues with the June 6 issuance changes we would like to commit our resources to addressing those now, and not possible-maybe-someday-async features at this time.

2 Likes

I understand the desire for a testing environment that reflects the current production environment, but we need to recognize that the staging environment is in a constant state of compromise, trying to balance many different testing needs. In some cases we want it to reflect prod as it is today, and in others we want it to reflect prod as it will be in the future.

In this case, it has revealed that your client does not support asynchronous finalization, a thing that it really should support, and which it may need to support if and when we turn async finalization back on in prod. While I understand the frustration at not being able to test the new chains in staging with your current client, this means that having async finalization required in staging is working as intended. Please use this as an opportunity to change or upgrade your client to one which does support async finalization.

4 Likes

Without a testing ground supporting synchronous calls, Let's Encrypt has effectively gone async-only. We could respect that were it reflected in documentation or clearly communicated elsewhere, but as-is you are correct: it is frustrating.

We'll accept this conversation and the current configuration of the testing environment as a form of communicating that async-only is looming...again. Regardless, it has been on our roadmap to migrate to certbot and fully modernize our system, but neither are likely possible before the issuance changes are released to production.

2 Likes

To provide a little more context here:

We do not currently have any plans to turn on mandatory async finalization in prod in the future. I wouldn't say that "async-only is looming". However, we may: turn it on for orders with many names; or turn it on for orders that require CAA rechecking due to relying on old validations; or turn it on for any order that takes more than 500ms to finalize; or turn it on for all orders during an emergency that is causing finalization to take unexpectedly long.

Because of this, we are keeping async finalization on in Staging for much the same reasons that we put a random key-value pair in the Directory: to encourage and require client agility. Asynchronous finalization has always been part of the RFC 8555 ACME specification; clients that do not implement it have always been time bombs lying in wait to cause issues for their operators. We ran smack into those problems when we first attempted to turn on async finalization; we are not willing to let clients ossify further.

Again, I'm sorry that this makes testing against staging difficult for your particular case. We are a small organization, and we cannot dedicate time and energy to supporting broken clients that have not had active development in over three years.

6 Likes

Hey, I'm apart of a small team, too! SPOILER: it's how we got married to lua-resty-auto-ssl years prior to its stagnation, and are yet to replace it because it still works (and from our perspective, smoothly, too) — we don't have extra time to go around fixing things that aren't broken and/or labeled as deprecated.

We appreciate the extra context, sincerely. The lack of other definitions, however, makes it difficult for a small team to allocate precious man hours, which I trust you understand. If sync calls are end-of-life, just say it. For real, this time (we know you did once and it didn't go well...but maybe there's more to that...). Hard dates and clear announcements can be planned around. Tidbits of info here and there throughout a forum with wishy-washy/contradictory requirements, not so much.

You say "broken client", but it's clearly functioning in production and not labeled as unsupported anywhere. In fact, it's even linked to as a client option, and has been for years.

So, we're trying to stay up-to-date on the API announcements, follow the guidelines, test for the upcoming changes...and we can't. I guess we're out of luck, but don't tell us it's because we're using broken things.

2 Likes

Hi @imfranklin,

From here https://letsencrypt.org/docs/client-options/#other-client-options
"... in which case we recommend contacting the project maintainers or switching to another client."

You might also consider opening an Issue with Issues · auto-ssl/lua-resty-auto-ssl · GitHub

2 Likes

Maybe we could help with some kind of workaround if we understood what exactly it is with the upcoming changes that you're trying to test? The intermediates on staging and production are different anyway (since the staging ones aren't trusted, of course). And it sounds like you haven't run your system against staging since before they turned on async finalization which was quite a long time ago now, so I don't think that you're testing how your system deals with an intermediate change. So, what exactly are you trying to test?

5 Likes

Unfortunately ACME doesn't have client compatibility modes, so clients have to stay up to date or they will break over time as services mature (hopefully within the confines of the rfc8555 spec, but that's not guaranteed by anyone). All ACME clients have seen this over time.

ZeroSSL etc have had async (i.e. slow) finalization for a long time and most steps in the ACME process require polling for status changes rather than assuming orders will move to the next expected status immediately.

As a general rule if you find your client has become incompatible with staging you can bet money you'll eventually be incompatible with production. If you're lucky the problem will affect many clients and therefore be big enough that it gets pushed back for a period of time, but if not then your client has to adapt or become obsolete.

Ironically there is just no set and forget when it comes to ACME services and client maintenance is the cost of entry.

2 Likes

It's not conforming to RFC 8555. Thus strictly speaking does not speak the entire "ACME" language and could therefore be considered (at least partly) broken.

Note that some free ACME CAs don't even have a staging provider. Another possibility might be to run your own Boulder instance. Although I'm not sure how well that's documented.

Best bet is to fork the ACME client used and modify it yourself if development has indeed stopped on the original client.

3 Likes