ARI Replaces - behavior differences across CAs for multiple candidates

Has anyone noticed if CAs handle ARI replaces differently when there are multiple candidates, i.e.:

  • Can a specific Certificate only be replaced once? (i.e. client-side bookkeeping is needed), or
  • Can any valid matching Certificate be used?

I am trying to figure out the best ways to handle situations like Duplicate Certificates.

2 Likes

I don't know about CAs other than LE, but Boulder definitely only lets you replace a given cert once. Section 5 in the current RFC draft also calls out that servers MUST send an HTTP 409 error if the new order is being rejected because the cert has already been replaced. But earlier in the same paragraph it only says SHOULD for checking that the cert has been replaced.

Servers SHOULD check that the identified certificate and the New Order request correspond to the same ACME Account, that they share at least one identifier, and that the identified certificate has not already been marked as replaced by a different Order that is not "invalid". Correspondence checks beyond this (such as requiring exact identifier matching) are left up to Server policy. If any of these checks fail, the Server SHOULD reject the new-order request. If the Server rejects the request because the identified certificate has already been marked as replaced, it MUST return an HTTP 409 (Conflict) with a problem document of type "alreadyReplaced" (see Section 7.4).

6 Likes

Thanks. It seems I had a bug on my test harness that made it look like I was able to replace certs on pebble/boulder twice, although that never actually happened. That got surfaced earlier today when working on backup CA support.

I've been doing a massive rewrite of our client, and settled on a "Renewal Configuration" being the primary object instead of a Certificate. This switch has been making some things much easier, and others harder.

4 Likes

@rmbolger The "MUST" from the draft-07 is just for the HTTP 409 return code. All the other items, even if the server should reject the renewal if already replaced before are 'marked' as "SHOULD".

1 Like

That's what I said.

1 Like

Everything is ironed out now. Thanks all.

Some quick notes for others working on this:

  • I ended up calculating list of potential replacements, displaying for interactive renewals, and magically selecting the oldest candidate for automatic renewals.

  • It seems like ARI checks would be more useful if they stated the certificate was already replaced and ineligible for replacement.

My client goals are admittedly different than others; everything is primarily designed to be handled via a JSON API, so I'm trying to automatically make the best decision instead of prompting a user.

3 Likes

That's interesting. Is that because you gather / manage ARI data somewhat separately from the task that manages cert renewal orders? Am I summarizing your info on that retry-after thread correctly?

I am still working through this but I am already saving ARI state data alongside each cert it belongs to. The core data is the suggestedWindow but also includes my own bits (ex: error info) and of course the retry-after response header value.

I haven't gotten to the "replaces" stuff yet but it wouldn't be hard to add that to this. This state data stays with the cert as long as I keep history for it.

With this state data I will know if I have already replaced a cert. It just doesn't come from the CA.

I hope I don't sound condescending. I am honestly curious. I am finding more subtle choices than expected with this and generally like exploring different perspectives. Thanks.

1 Like

well renewal window itself doesn't make sense context wise for already replaced certificate

@orangepizza Was that for me? Because the suggestedWindow has validity until ARI data is refreshed when retry-after expires.

CA ARI data (and my state data) becomes invalid if the cert expires (per RFC).

In the case of "replaces", I plan to have that indicated in my own state data when a cert's replacement is issued. So, I would know if a specific cert was "replaced"

Which is different than just getting a new cert with the same set of domain names (and, later, same "profile"). @jvanasco said something about renewal profiles so if those are generic that is different than tracking ARI state info for specific certs.

I was just curious for more details on why his architecture would find it helpful for the CA ARI call to return "replaced" info.

2 Likes

There's a small divergence between Pebble and Boulder btw: Boulder alllows you to only have one active replaces order for a certificate - it will fail if you create a second one and there already exists one. Pebble, on the other hand, let's you create multiple replaces orders for the same certificate - until you complete one. Then you can't create new ones anymore. (I haven't tested whether you can actually complete the other orders, or whether that also fails.)

2 Likes

Yes, that was the issue that misled me on tests. I believe this effect is due to Boulder recycling a pending order for the nexus of an Account+Domains; I think pebble allows co-pending identical orders. I usually try to catch this condition, but forgot to do the check on one of my order triggers :confused:

2 Likes

Is that because you gather / manage ARI data somewhat separately from the task that manages cert renewal orders?

No. I manage all the cert data in SQL, so it's extremely easy to process/search/recall everything for a given cert. The ARI data is stored on each Cert's record.

My system doubles as an ACME Client and Certificate Manager [to provision certs into dynamic loading of clustered nginx servers]. If my system requested a given Certificate, I know the full ARI history - but if a Certificate were procured elsewhere and uploaded into it, I have no idea – so I could be continually polling for ARI data of a Certificate that can't be "replaced". Also - while Certbot considers there to only be a single "live" Certificate for a given "lineage", I am (trying to) support multiple duplicate Certificates at a given time.

We have a whitelabel product that Subscribers CNAME onto; the primary usage of the system has been to order and manage those (Second Party) Certificates, and dynamically load them into Nginx. For our own (First Party) certificates, those are managed by a non-internet computer using DNS-01 auth and deployed to the public servers. One product is already split across 2 data centers, and I'd like to be using a dedicated cert per data-center (as part of the security compromise plan).

That being said, the reason for this question has to do more with the human UX and not Automatic Renewals.

I moved the primary object of my system into a RenewalConfiguration, because that will easily tie together a primary and backup certificate into a single security policy: Use Account A on CA 1 to order N domains with X Private Key Technology. To support a backup CA, I just need to add a second account selection and a few DB fields to the renewal.

Because I didn't remember that passage in the RFC, not remembering to defend against the pebble/boulder divergence above (we all make mistakes, some days many more than others!), I basically ran into this problem:

  • Automatic Systems will want to renew based on a Cert's ARI
  • A HUMAN will want to renew from the perspective of a Cert, an Order or a Renewal Configuration. Based on whatever they select, I jump to the Renewal Configuration and suggest the replaces field:
  • I now have 2 live duplicate certs, one is "renewable" and the other is not.
  • I might also have a secondary order, so there is another candidate.

The solution I ended up on, is basically this:

# when the cert is issued
cert1.replaced_by = cert2.ari_identifier
cert2.replaces = cert1.ari_identifier

Then, when displaying candidates, I exclude certificates that have already been replaced.

This needs to be done at issue, because multiple orders can be started with a "replaces".

The system is a bit bloated and complex because it's been doing too many things; I was previously "backporting" it to a public version, but decided to just redesign the public version as scaled down and do most of the other stuff on our systems. AKA, digging out from technical debt.

Because it is mostly handled through a programmatic API, I try to break it as much as possible through edge cases and race conditions.

Currently I have two installations of the system:

  • installation A works as a client & certificate manager
  • installation B is for research and analysis of public certificates

One of the products we have is a web index / spider for online content. It monitors publishers and social media for citations, and is used to track sharing performance, determine trends, nd generate content-writing recommendations.

It analyzes all the metadata associated with an article to identify and disambiguate the authors, publishers, and likely underlying entities. Multiple different websites can often be associated onto a single content farm based on shared metadata information - like reporting into the same anayltics ids, sharing the same facebook admins, or even having the exact same setup of 3rd party apps. When they aren't hiding behind cloudflare or akamai, Certs and IPs are often useful for associating websites together; the russian fake-news content farms often had multi-san certs or shared a key.

So, that second installation is loaded with all the certs found through spidering.

2 Likes

Ah, the ARI state of imported certs is unknown until your first issuance. Although the extra polling isn't awful. The 3rd party cert monitoring systems will be doing that too. It might be a handy bit of info to display alongside a cert. But, it doesn't affect the cert's validity just the ability to "replace" it and avoid rate limits and the other benefits.

I don't think I'd guess and only use 'replaces' for certs I'd issued. As noted, I plan to store 'replaces' status along with the other state data. Based on your comments I may change how I do this by marking the state as "replaceable" when I issue cert and "replaced" when it was. If I ever had to cope with imports (which I don't see happening) the state data wouldn't have gotten marked "replaceable" so I wouldn't have to worry about the order failing for faulty 'replaces' value.

So, thanks for that explanation. Very helpful.

1 Like

NP. That's why I post here - hoping others can learn from my mistakes as much as I learn from theirs.

I dropped replaces for imported certs; it's not worth the work at this point. I thought it would be neat if I could pull it off - but it's overly complicating everything. Getting profiles and replaces into managed orders was under 2 hours of work; I've spent at least 8 trying to support it with managed certs, and it just makes everything more difficult. If think I might allow it via the API if the ARI Identifier is not already in my system, but otherwise not handle it in the human UX.

With that change, the human UX from the RenewalConfiguration endpoint is to select a child cert for replaces, or omit it to start a new lineage.

2 Likes