Error handling for ARI "replaces" conflict

A user of my client, which now has ARI support, recently ran into an issue that resulted in the renewal process trying to create a new order with the replaces field set to a value representing a cert that had already previously been replaced. When this happens, Boulder currently responds with HTTP 409 (Conflict) and this straightforward ACME error body:

{
  "type": "urn:ietf:params:acme:error:conflict",
  "detail": "While validating order as a replacement an error occurred :: cannot indicate an order replaces certificate with serial \"XXXX\", which already has a replacement order",
  "status": 409
}

Regardless of how the client ended up trying to replace the same cert twice, it seems like it should be relatively easy to catch the error and retry the new order without the replaces field (perhaps with a warning to the user about what happened).

I double-checked the current ARI draft-04 and while it does say the server should check for this scenario and reject the order, it doesn't really specify any error codes or types to use, which makes me worry that different implementations will reject duplicates differently and make error-handling more difficult.

Servers SHOULD check that the identified certificate and the New Order request correspond to the same ACME Account, that they share at least one identifier, and that the identified certificate has not already been marked as replaced by a different Order that is not "invalid". Correspondence checks beyond this (such as requiring exact identifier matching) are left up to Server policy. If any of these checks fail, the Server SHOULD reject the new-order request.

I haven't had a chance to check how (or if) Google's implementation handles duplicate replacements, but it seems like the ARI draft could use some additional guidance here, @aarongable. It would be unfortunate if I had to special-case this per-provider because implementation choices diverged.

I also noticed that urn:ietf:params:acme:error:conflict is not listed in the ACME URN namespace for errors and RFC8555 says:

Servers MUST NOT use the ACME URN namespace for errors not listed in the appropriate IANA registry (see Section 9.6)

I'm not sure what the process is to expand the namespace or whether it might already be in progress, but that should probably happen as well.

9 Likes

Yes, agree it maybe need more explicit definition as noted on my previous comments on ARI implementations - #8 by webprofusion

I feel that "replaces" is a courtesy to the CA, so failing the order seems unnecessary and error prone.

6 Likes

I've also been bitten by this in my tests some time ago (haven't managed to continue working on this yet). This can be prevented by disabling the authzs for the failed previous order before starting a new order, but obviously that has some downsides as well, and depending on the error (maybe the ACME client crashed and didn't store the order URL somewhere) the order URL isn't available (and Let's Encrypt doesn't allow to list orders). So having more specific error codes for this would indeed be extremely helpful for client development!

6 Likes

Yes, this is tracked in Return HTTP 409 "Conflict" when the certificate identified by 'replaces' has already been replaced · Issue #56 · aarongable/draft-acme-ari · GitHub and was discussed at IETF 120; I just haven't updated the latest version of the document yet.

7 Likes

Awesome. I figured it was being handled. Just wanted to make sure.

4 Likes