Breaking changes in Asynchronous Order Finalization API

Hi there,

we have noticed that on Staging, after recent changes that enable Asynchronous Order Finalization (Enabling Asynchronous Order Finalization) our client stopped working, because /order endpoint no longer returns in Location itself.

Example:

New API:

  1. finalize: https://acme-staging-v02.api.letsencrypt.org/acme/finalize/123/123
  2. finalize response location: https://acme-staging-v02.api.letsencrypt.org/acme/order/123/123
  3. order response location: null

Old API:

  1. finalize: https://acme-staging-v02.api.letsencrypt.org/acme/finalize/123/123
  2. finalize response location: https://acme-staging-v02.api.letsencrypt.org/acme/order/123/123
  3. order response location: https://acme-staging-v02.api.letsencrypt.org/acme/order/123/123 (self)

Our code was getting the 'order' location and checking status in following manner (more or less):

var finalizeResponse = DoFinalize();
var orderLocation = finalizeResponse.Location;
while (true)
{
var statusResponse = CheckStatus(orderLocation);
orderLocation = statusResponse.Location; // this is null after deployment of new API

if (statusResponse.Status == "valid")
{
    break;
}

}

Is there a chance that you could set the location to keep the backward compatibility with previous API?

4 Likes

@lestaff is this change intended?

4 Likes

Hi there, thanks for the report! This exactly the kind of bug that staging deployments are meant to uncover. I don't expect the change to have affected the Location header but I'll look into it today.

7 Likes

It's worth noting that RFC 8555 does not require a Location header for Finalize and Poll Order requests, only for New Order requests:

   The following table illustrates a typical sequence of requests
   required to establish a new account with the server, prove control of
   an identifier, issue a certificate, and fetch an updated certificate
   some time after issuance.  The "->" is a mnemonic for a Location
   header field pointing to a created resource.

   +-------------------+--------------------------------+--------------+
   | Action            | Request                        | Response     |
   +-------------------+--------------------------------+--------------+
   | Get directory     | GET  directory                 | 200          |
   |                   |                                |              |
   | Get nonce         | HEAD newNonce                  | 200          |
   |                   |                                |              |
   | Create account    | POST newAccount                | 201 ->       |
   |                   |                                | account      |
   |                   |                                |              |
   | Submit order      | POST newOrder                  | 201 -> order |
   |                   |                                |              |
   | Fetch challenges  | POST-as-GET order's            | 200          |
   |                   | authorization urls             |              |
   |                   |                                |              |
   | Respond to        | POST authorization challenge   | 200          |
   | challenges        | urls                           |              |
   |                   |                                |              |
   | Poll for status   | POST-as-GET order              | 200          |
   |                   |                                |              |
   | Finalize order    | POST order's finalize url      | 200          |
   |                   |                                |              |
   | Poll for status   | POST-as-GET order              | 200          |
   |                   |                                |              |
   | Download          | POST-as-GET order's            | 200          |
   | certificate       | certificate url                |              |
   +-------------------+--------------------------------+--------------+

So in general it may be wise to update your client to not rely on it, as other ACME Servers may not provide it.

Based on my reading of the code, I don't think that our usage of the Location header has changed. I believe that, both before and after this change, we did the following:

  1. creating a new order: return a Location header pointing to the newly created Order (because it's required by RFC 8555)
  2. finalize an order: return a Location header pointing to the existing Order (because the example at the very end of Section 7.4 has one)
  3. polling an existing order: do not return a Location header (because there is neither a requirement for one nor an example containing one)

You can see this behavior in action by making a simple GET requests for an Order URL to the Prod API (like https://acme-v02.api.letsencrypt.org/acme/order/123/123, but with real account and order ID numbers) and observing the headers; they do not currently contain the Location header and this change has not yet been deployed to Prod.

When we were doing fully-synchronous finalization, your client worked, because it would never actually need to poll the order. It only got the Order object when creating a new order and when finalizing the order, and both of those contained the Location header.

Now that finalization can by asynchronous, your code now has to actually poll the order object, and that response does not contain a Location header.

I'm happy to update our code to return a Location header in our get-order responses, but you should definitely update your client to work with servers that do not do so.

8 Likes

The client should also have a test suite that runs against the Pebble server, which implements the RFC strictly, and also implements non-required RFC elements differently than Boulder. Pebble already required polling, so this particular issue would have surfaced before the API change. Running tests against Pebble is likely to surface similar RFC related issues before any changes to LetsEncrypt's API are made .

5 Likes

Thanks for investigation. We already updated code to not rely on the Location, so future versions of our product should be fully compatible with upcoming API. But if possible we would love to maintain backward compatibility due to older software versions that are already deployed by our customers, so if you could add back the Location header to get-order then it would be a fantastic news for us.

1 Like

Thanks for the tip, we will definitely set it up.

1 Like

Can you PM me the user agent used by your client please? I'd like to do some spelunking through our logs.

4 Likes

looks like plesk not like this change

4 Likes

I do think that error is related to async finalize, but I don't think it's related to Location headers. We'll keep that discussion over there.

4 Likes

We've disabled the brownout early because of problems like this

4 Likes

Would the LE staff be open to clearing certain rate limits due to the brownout?
At Vercel, we are adjusting our code, but the rate limits are still there (understandably) and causing some issues with specific domains.

Which rate limit exactly? It probably isn't the duplicate cert rate limit with a window of 7 weeks (as you wouldn't be able to get certs if you were having trouble), but most likely the new order rate limit with a window of just 3 hours.

Are those 3 hours really a problem now that the brownout period has been disabled 14 hours ago?
Nevermind, see below.

2 Likes

wouldn't asynchronous finalization still mean CA start signing when client called finalization? CA would still sign in background but client never ask finished certificate.

5 Likes

Hmm, good point. You can probably hit the rate limit regardless of the cert was downloaded or not.

3 Likes

We are hitting the following:

Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: , retry after 2023-04-07T06:29:03Z: see Duplicate Certificate Limit - Let's Encrypt

Trying to assess what we can do in the meantime, but clearing those rate limits would be lovely :smile:

1 Like

Can ZeroSSL take some of the load [for the next week]?

4 Likes

I am checking them out. Ty!

2 Likes

Since you hit the Duplicate Certificates Rate Limit, you only need to wait 24 hours before you can attempt to issue again.

Other alternatives include:

  • Recover the Order ID from your client logs, make a GET request for that Order, and it will have a certificate url from which you can download the Certificate
  • Search crt.sh for your hostnames, and download the issued-and-logged-to-CT certificates from there
  • Issue slightly different certificates (e.g. including or excluding www. variants of the domain names)
6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.