Soliciting feedback on shortening authorization lifetimes to 7 hours

Hi forum,

I've got a question for you all. You can skip the Background section if you like, but it does provide helpful context for fully understanding the motivation behind asking this question.

Background

Today, a new Authorization is created with a lifetime of 7 days. If the client does not attempt to fulfill the Challenge within that time period, then the authorization expires and the need to create a new Order and new authorizations to try again. In addition, if the client creates a new order during that time, the existing pending authorization will be attached to the new order, to prevent too many duplicate pending authorizations from accumulating.

Today, when an challenge is successfully validated, its corresponding authorization is then given a lifetime of 30 days. During that time, any new orders created by the same subscriber for the same name will be paired with that existing validated authorization, and issuance can occur without the client having to demonstrate control again.

However, when we conduct domain control validation, we actually check two things at the same time: whether the domain control challenge succeeds, and whether the domain has any CAA Records which would prevent us from issuing for that name.

The Baseline Requirements state that validated authorizations can be re-used for up to 398 days; hence our 30 days is already much shorter than the maximum allowable time. However, they also state that CAA checks can only be used for up to 8 hours. We keep ours around for 7 hours, to be safely under the limit.

This means that we have extensive infrastructure to enable rechecking CAA, to ensure that we're still allowed to issue when the validated authorizations are more than 7 hours old but less than 30 days old. These include an entire gRPC service, a post-hoc audit service, and extensive reuse logic. I'd like to be able to get rid of all of it.

Proposal

Shorten the lifetime of a validated authorization from 30 days to 7 hours, to match the lifetime of a CAA check.

Benefits:

  • Allows us to remove the CAA rechecking infrastructure described above.
  • Improves the overall security of the internet, by not allowing a single validation to continue to drive issuance for 30 days.

Drawbacks:

  • Clients which request multiple orders for the same name within a 30-day period will have to complete multiple challenges, rather than reusing the first validated authorization.
  • Clients which take more than 7 hours between completing their validations and finalizing their order will break.

Questions

Will reducing the authorization reuse lifetime from 30 days negatively affect your ACME clients or your Web service which use an ACME client? Would having to complete multiple challenges in a month have a negative effect on your operations?

Are you aware of any clients which leave orders open for extended periods of time? Are you aware of any use-cases for purposefully slow-rolling the finalization of an order?

Thanks, and I look forward to your feedback!

11 Likes

Seems to me a Client taking more than 7 hours has way bigger issues than finalization of their order failing.
Overall I like it and is a net positive value add.

8 Likes

Not for me personally, but I believe there have been cases where some dns-01 challenges took more than 7 hours for world wide propogation (and with that Let's Encrypt validation). ACME users depending on those very slow DNS services and who require the dns-01 challenge could have issues with such a short authz lifetime.

8 Likes

Interesting, that is good to know. Thanks @Osiris! :slight_smile:

3 Likes

I believe the proposal only relates to validated authorizations, not pending ones though. So this scenario would only apply where multiple authz are being done, with wait times of more than 7 hours in between them.

5 Likes

If I understand correctly, an authz is created with a certain lifetime, which starts counting on its creation, not on its validation. I'm pretty sure that if a client tries to trigger an authz for validation and its lifetime has been exceeded, Boulder would give an error?

"Proof": section 7.5 (RFC 8555 - Automatic Certificate Management Environment (ACME)) shows a pending authz with a certain lifetime. So it indeed has a lifetime already when created.

Perhaps most elegantly a pending authz has a longer lifetime (e.g. 30 days) which gets shortened to 7 hours when the status becomes "valid"?

7 Likes

I'm not that familiar with the ACME specification (have yet to write a client myself), but the background paragraph from @aarongable reads:

Which does sound to me like there are multiple lifetimes involved here, with the proposal affecting the latter.

7 Likes

Ah yes, I've missed that part. I didn't know Boulder already changed expiry datetimes when the status changes.

In that case there's no objection regarding my (now deemed flawed) dns-01 argument.

6 Likes

I think the argument still holds, the scenario is just different:

  • ACME client wants to validate two domain names (A and B) for a single certificate, both with DNS-01
  • ACME client starts challenge for domain A, adds the record, waits n >= 7 hours, completes challenge. Authz is now valid for 7 more hours.
  • ACME client now starts working on domain B, adds the other records, waits n >= 7 hours, completes challenge. Authz of domain A has expired in the meantime.

Doesn't that still trigger the issue you described? It does require an ACME client doing all challenges sequentially and not in parallel though - no idea how common that is?

6 Likes

Hm, also a good point indeed, could still be troublesome for dns-01.

7 Likes

Yeah, I think the week before the authorization is checked is needed in order to do with wonky DNS setups, but once it's validated it sounds like it's just for the month-without-needing-to-revalidate? Which offhand I think just causes confusion when someone can get a certificate in production but not staging, because certbot doesn't reuse the authorization in staging and their DNS is broken now, but they won't find out that they broke the DNS in production until the 30 days is up.

Not for me, but I think there are a couple weird ones where 7 hours might be a pain point (though it could probably be shorter than 30 days by quite a bit):

  • Sometimes, people needing to do DNS validation trying to get an apex-and-wildcard certificate, but have a DNS provider that doesn't support multiple TXT records at once for some reason, are told to try to make a cert with just the apex, and then the certificate with both the apex and wildcard, so that the second one can use the cached validation. Combined with some of the DNS-propagation-time-to-all-authoritative-servers issues, they might want more than 7 hours between those requests. (Yeah, it's working around several broken things, but I bet it's an issue for someone out there.)
  • Along those lines, I don't know if maybe someone has a workflow where they request both an RSA and an ECDSA certificate, and having an cached validation makes things easier for them.

Not offhand, but it wouldn't shock me. :slight_smile:

Well, I'd be a little wary if I were you in forcing the validation reuse to be dependent on CAA rechecking requirements, since if the rules around CAA checking get stricter you'll be forced to either make your validation reuse stricter by the same amount or bring this infrastructure back out of mothballs.

7 Likes

OR
It requires completing all challenges once the (first) 7 hour countdown is started.

I think we may be facing two niche type scenarios.
It may play better to use some sort of [mathematical] middle ground.
Like say: 72 hours [three days]
[which is like x10 of the short 7 hours limit and /10 of the 30 day max limit]

7 Likes

Seems like it would cause far reaching issues beyond Let's Encrypt if CAA's TTL got restricted by the CA/Browser Forum

From https://github.com/cabforum/servercert/blob/2c63814fa7f9f7c477c74a6bfbeb57e0fcc5dd5b/docs/BR.md#3228-caa-records

3.2.2.8 CAA Records

As part of the Certificate issuance process, the CA MUST retrieve and process CAA records in accordance with RFC 8659 for each dNSName in the subjectAltName extension that does not contain an Onion Domain Name. If the CA issues, they MUST do so within the TTL of the CAA record, or 8 hours, whichever is greater.

This stipulation does not prevent the CA from checking CAA records at any other time.

2 Likes

I am not sure if that meets the desire of, I may not be understanding correctly.

4 Likes

I don't think there is a "magic number" that will get rid of "all of it".
I'm just shooting for the largest possible number ... 99.9%.
I'm not going to pullout graphs and charts, but there doesn't seem to be a spot where 100% are covered.

4 Likes

This is an interesting point, and one that raises a different possibility: never check CAA at validation time, always check CAA at finalization time. However, this would make the vast majority of issuance much slower, as CAA checking is inherently slow (lots of network roundtrips to authoritative DNS nameservers) and the vast majority of issuance occurs within 7 hours of validation completing. So we certainly could go that route if, say, the CAA TTL was changed to a matter of minutes, but I think we would avoid that for now.

There is: if the CAA and Validation lifetimes are equal, that we can remove the code which handles those two kinds of checks separately, and streamline it into treating them exactly the same. As long as validation lifetimes are even a little bit (such as 72 hours) longer than CAA lifetimes, we don't get to remove any of that additional complexity, so shortening to a "middle ground" is not something we'd be interested in.

Huh. I've heard of DNS propagation being slow, but never suspected it could be that slow. This is definitely something to keep in mind, but I think the conversation above arrives at a reasonable conclusion: for this to be an issue, you have to both have terribly slow DNS propagation and only be able to set a single text record at a time. I'm working towards getting numbers on how long most clients go between challenges and finalization, so this should show up if it's common.

8 Likes

And expected to stay equal for the foreseeable future.

1 Like

I am not finding anything that prevents the TTL of the CAA record to be set to "The maximum is 2^31−1, which is about 68 years".

From here https://www.rfc-editor.org/rfc/rfc2181.txt
5.2. TTLs of RRs in an RRSet

Resource Records also have a time to live (TTL). It is possible for
the RRs in an RRSet to have different TTLs. No uses for this have
been found that cannot be better accomplished in other ways. This
can, however, cause partial replies (not marked "truncated") from a
caching server, where the TTLs for some but not all the RRs in the
RRSet have expired.

Consequently the use of differing TTLs in an RRSet is hereby
deprecated, the TTLs of all RRs in an RRSet must be the same.

Should a client receive a response containing RRs from an RRSet with
differing TTLs, it should treat this as an error. If the RRSet
concerned is from a non-authoritative source for this data, the
client should simply ignore the RRSet, and if the values were
required, seek to acquire them from an authoritative source. Clients
that are configured to send all queries to one, or more, particular
servers should treat those servers as authoritative for this purpose.
Should an authoritative source send such a malformed RRSet, the

Elz & Bush Standards Track [Page 4]

RFC 2181 Clarifications to the DNS Specification July 1997

client should treat the RRs for all purposes as if all TTLs in the
RRSet had been set to the value of the lowest TTL in the RRSet. In
no case may a server send an RRSet with TTLs not all equal.

3 Likes

I think three days is good... generous.

[Fri Dec 16 09:38:04 PST 2022] riptidetech.io:Verify error:Incorrect TXT record 

I ran the renewal manually 15 minutes later (which would have run @ 03:01 PST on the 17th) and succeeded.

[{"hostname":"riptidetech.io"}],"validated":"2022-12-16T17:51:15Z"}'

Sorry the times are not synchronized. But 99% of the time it is the second run that "gets the goods"
My 2 cents.

5 Likes

Yeah I think this is fine, for comparison ZeroSSL doesn't cache validated auth at all (it may have done in the past but it doesn't reliably do so now, mind you it also takes a serious amount of time to complete http validation etc) so clients already have to cope with that sort of thing.

6 Likes