Boulder: Order <> Authorization Relationship

A constant cause of annoyance for me is the nebulous relationship between an Acme Order and Authorization in the RFC.

I've gotten some help on this concept here in the past, but trying to be server-neutral is requiring me to write a lot of extraneous code to handle edge cases of an Authorization being tied to multiple Orders, or vice-versa. At this point the workload to support this is too much and I realistically have to target support to the Boulder implementation.

I am hoping for some guidance on how Boulder handles the relationship between the two objects.

I generally do not care about an Acme Order re-using a validated Authorization, as that happens entirely within Boulder and the client is oblivious to it.

My concern is with Pending Authorizations and Orders. Can a pending Authorization be tied to more than one order in Boulder? Testing against Boulder and Pebble, a single Account can:

  1. generate a given Acme Order
  2. not trigger any validation
  3. generate a second, identical, Acme Order
  4. the authorizations of both orders will be different

Is this expected? Is this likely to change?

The concern for this, from a client developer perspective:

  • An ACME Challenge failure is fatal to an Authorization; an Authorization failure is fatal to an Order. Extra work is required to handle potential multiple Orders.

  • A lot of extra work is required to handle generating audit logs that correlate an Authorization a preferred Challenge (e.g. favoring DNS-01 to HTTP-01). As my "challenge preference" is tied to an ACME order, I have to work backward from a Challenge all the way to (multiple potential orders) to generate the right reporting - or generate a lot more logging data.

If Boulder isn't generating unique Pending Authorizations per Order, that is annoying and I'll have to handle this. If Boulder is generating unique pending auths, then there isn't any reason to do the additional work at this point.

2 Likes

I would love to know how you're accomplishing that. In developing Open ACME, I resubmitted an identical order hundreds of times and always received the exact same authorizations except when an authorization would fail. Keep in mind that I also would always receive the exact same order url as well. I have no idea how you're able to get a different order url when submitting the exact same domains. Did you authorize any of the challenges (authorize, not trigger for check)? Currently, Open ACME only supports dns-01 challenges, so I automatically authorize all of the dns-01 challenges before showing the user the TXT records to be added and waiting for confirmation of creation.

Not true for me. I just get a new (pending) authorization in place of the fatal one when I resubmit the order.


I feel like we must be following somewhat different paths in terms of sequence, checks, and jws submissions.

I just ran my tests and... this is a divergence between Pebble and Boulder. I'll have to submit a ticket. Thanks for pointing this out.

You have to resubmit an order, because the failed challenge was fatal to the order. The RFC requires:

  • If a Challenge Fails, it moves from processing to invalid. This is fatal to the Authorization which will move from pending to invalid.
  • If an Authorization moves from pending to invalid, the Order moves to invalid.
1 Like

Interesting... I'll need to check here what I'm experiencing. I thought it was the same order since I recalled receiving one new challenge and one previous challenge, but I may be recalling incorrectly here. Certainly worth looking into.

Personally, I don't believe in using the "cached" authorizations and do believe that each authorization should be unique to an order. I know that caching the authorizations saves processing for the CA server, but unless you're trying to distribute certificates across servers that will be serving overlapping domains as in the case of load balancing (and making the CA do the work in providing those certificates rather than duplicating them yourself), I don't see the point in caching the authorizations (and thus sharing authorizations across orders). If someone wants to generate an expanded or corrected certificate, it should be an infrequent ordeal. Generating identical certificates is terrible. In my opinion 5 is too high for the rate limit.

@Griffin I opened this ticket against Pebble https://github.com/letsencrypt/pebble/issues/324

I can confirm that pending authorizations are re-used across orders in Boulder but not Pebble.

In terms of re-using orders:

  • Pebble assigns a unique order url each time
  • I have not been able to reliably recreate two orders having the same URL as you have.

I have recreated this in Boulder, I just can't do it reliably. Sometimes it happens, sometimes it doesn't.

I've tried the following scenarios:

  • Single domain orders.
    • Order 1 submitted > Order 2 submitted
    • Order 1 submitted > Challenge Fails > Order 2 submitted
  • Multiple domain orders.
    • Order 1 submitted > Order 2 submitted
    • Order 1 submitted > Challenge 1/2 Fails, Challenge 2/2 pending > Order 2 submitted

Sometimes an order url is reused, sometimes it is not. There may be an issue with pending orders that I need to test against.

2 Likes

Hmm... Are you seeing the inconsistency across repeated operations that aren't changing state? Like:

Order 1 submitted > Order 2 submitted ... Order 5 submitted

If there's non-static behavior for static input (and static process in this case), I could only assume there's some type of load-balancing (or other stateless-type issue) happening behind the curtain. I don't think they'd use a PRNG for this... :laughing: Would keep developers on their toes (or extremely annoyed). Maybe database transactions are being piled-up and not yet committed when accessing records from different systems? Wouldn't think that would be allowed, but just a random thought.

A short summary of Boulder's behavior:

If an existing order for the same names is in pending status, Boulder will return its url rather than creating a new one.

Otherwise, Boulder will create a new order object.

That order object may contain a mix of new pending authorizations, reused pending authorizations, and reused valid authorizations. Boulder will prefer valid over pending, and pending over creating a new authorization.

As you've noticed there is a many to many relationship between Orders and authorizations. One order can contain multiple authorizations (for different names), and one authorization can be a member of multiple orders (that all contain the same name).

FWIW, I think we'll probably keep the Pebble divergence. Our goal with Pebble is to be a testbed for the ACME protocol rather than to mimic Boulder. So to some extent, if we have differences where both Pebble and Boulder are valid according to spec, that's useful for making sure clients don't overfit Boulder's behavior.

2 Likes

I don't have an opinion on keeping/changing behaviors. I just care about documentation. Boulder and Pebble behave fundamentally differently here, and the Pebble docs do not suggest this sort of divergence.

At this point, Pebble seems worthless for Development and CI tests. It seems only suited to be an alternate/secondary RFC implementation to test against.

2 Likes

Thank you so much for comfirming my thoughts on these, especially the second one. :slightly_smiling_face:

Does this ever present transactional problems? I find the use case for a single domain being on multiple, active, non-identical orders to be rather curious unless there is something like an asymmetric load-balancing-type scenario desired. That's probably not common though. Perhaps trying to get the common name to match across systems serving subdomains of the same apex for unity purposes?

In #help I try my best to discourage people from including names when the webserver on which the certificate is intended to be installed will not actually be serving for that name. I know it can save some CA resources, but I really don't like the "fit-all" certificate that requires copying private keys across servers or issuing "identical" certificates on each device with the extra names. I hope this is wise. I look for situations like this for the upcoming handbook to encourage best practices. It also helps to solidify the underlying metaphor for the narrative. Please advise here.

Most of the authz reuse logic was added to deal with broken clients. We see a surprising number of clients that create the same order over and over again and never fulfill it, or create an order when they've already done validations. We address some of this with rate limits, but reusing authzs goes a long way towards limiting resource usage from these broken clients, too.

1 Like

This makes sense. It would have been nice if - when that happened - changes were reflected in Pebble. e.g. Pebble could default to a 50/50 split to (i) reuse pending authz and (ii) create duplicate orders, with flags to change to 0/100 like it does for reusing validated authz.

1 Like

I forgot to circle back on this. After a lot of testing I loosely figured out some of the logic. The Order URL will not be re-used if one of the involved domains has failed a Challenge/Authorization within an undetermined amount of time; there likely is some other criteria. Testing with randomly generated domains is required to trigger this behavior.

This is going to cause a bit of pain for my tests, as they relied on predetermined domains registered into a hosts file.

2 Likes

Too... many... variables... Control... group... insufficient.

I feel you. Even with a functioning client I still feel like there are many things wrong, incomplete, or outright missing in my client. A decade ago I was in charge of interpreting and implementing ANSI and ITUT standardized testing procedures for telecom test equipment for the internet backbone at one of the leading companies. The caveats and ambiguities were unreal. I felt like I needed to learn German to get clarification. :upside_down_face:

1 Like

Thankfully after an audit, my client hasn't encountered any errors caused by this on our production systems, but it's incredibly annoying. The only changes I needed to rush out, were a quick check to see if an order was previously tracked, conditionally invoking an "update" instead of "create", and adding a unique constraint to the database. It was a 5 minute fix.

Our main usage of my client is API driven interactions by automated systems, so things are a bit more fragile than most clients which are invoked or overseen by humans, or cronjobs that process the entirety of a single order at once (like certbot). One system that invokes it does the Order+Validate at once, but most others handle the various steps piecemeal. This is why there is a lot of bookkeeping and report generation built-into the client... and why I am more-than-slightly-upset Pebble did not disclose certain divergences.

I was hoping to gut some code this week, to streamline some dns-01 challenge work... but now I'll just have to clean it up instead.

1 Like

Code around a moving target in the dark. You can do it. Seriously though, that sucks. Finding out some fundamental assumptions don't hold in a mature design can be really frustrating. It's when you start to crack classes and refactor data structures that the Irish cream enters the coffee. :coffee: