Base domain validation

Authorization Domain Name:
The Domain Name used to obtain authorization for certificate issu
ance for a
given FQDN. The CA may use the FQDN returned from a DNS CNAME lookup as the FQDN for the purposes of
domain validation. If the FQDN contains a wildcard character, then the CA MUST remove all wildcard labels
from the left most portion of requeste
d FQDN. The CA may prune zero or more labels from left to right until
encountering a Base Domain Name and may use any one of the intermediate values for the purpose of domain
validation.

^^ I noticed the above in the most recent CAB forum guidelines. It doesn’t seem to restrict use of a base domain to DNS … does the above imply that a CA could, as per the guidelines, validate 1,000 subdomains of example.com by validating just example.com?

I’ve “rerouted” this enquiry to the ACME mailing list since it seems more relevant there.

You are correct. Let's Encrypt doesn't do this for their own reasons, which are discussed in numerous threads on this forum.

With ACMEv2, some other CA could easily coaelesce a large number of domains requested in an Order into just a few validations for each base domain needed to finalize it. This was kind of difficult with ACMEv1 when you just asked for a bunch of authorizations up front.

Thanks for your response! I’m aware that LE doesn’t honor parent domain authz in the case of HTTP validation, and the rationale seems reasonable.

LE does, though, honor these relationships in the wildcard case with DNS validation. Why does that not extend to the general case? For example, since I can get a certificate for *.example.com by demonstrating DNS-based control over example.com, why do I have to do a separate demonstration to get a certificate for foo.example.com?

I did search the forum for commentary on this but didn’t find it.

Thanks again!

I think this is ultimately an architectural thing rather than a security thing, but maybe @jsha could explain it?

“Shaking the branch” on this thread. (@jsha?)

It would be a great convenience if authz on a base domain could substitute for an arbitrary subdomain. While I understand the rationale for not doing this with HTTP, the fact that LE uses this logic to justify wildcard begs the question of why it can’t work for “regular” domains.

Thanks!

It’s an interesting question. We chose to allow DNS validations to work for wildcards because wildcards have a handful of valuable use cases that aren’t fully solved by other approaches. Base domain validation via DNS for non-wildcard certificates is indeed permitted by the BRs. However, implementing it would add significantly to Boulder’s complexity, and I don’t see a compelling reason to add it. If you are answering DNS challenges manually, I can see how it would add convenience, but one of our chief goals is to encourage subscribers to automate validation and issuance. Implementing a policy change that mainly benefits people doing manual validation seems to run counter to that goal.

This would radically simplify automatic validation because it would be possible to do all validations for a domain by altering a single zone:

  • foo.example.com
  • *.haha.example.com
  • www.example.com
  • example.com

^^ A single DNS validation/authz against example.com would suffice as authz for all of these, which would hugely speed up and simplify the workflow … ?

I agree that it would mean fewer validations required to issue all those certificates, but when those validations are automated (and backgrounded after the first one), it shouldn’t matter how many validations are required, right?

There’s also a small nuance here, that current wildcard policy only considers a DNS validation good enough to issue a wildcard at a single level; if you want deeper wildcards, you have to do a different DNS validation. This bypasses a moderate bit of complexity around zone cuts and delegations. I’m assuming here that any proposal to expand DNS validation for non-wildcards would only grant that same level, not deeper names (like foo.bar.example.com).

Zone modifications are much heavier and more error-prone (for us, anyway) than a filesystem modification for HTTP-based authz. That’s why we’re only implementing it now, 2+ years after our initial LE implementation.

There can also be misconfigured subdomains that throw off DNS authz; authz against a base/parent domain leaves less room for error.

Can you describe this type of misconfiguration in more detail? Thanks!

foo.example.com CNAME does-not-exist.com

^^ An attempt to do DNS authz against foo.example.com directly will fail because of the misconfiguration. By virtue of controlling example.com, though, I automatically control foo.example.com, so an authz against example.com can still succeed and justify issuance of a certificate for the subdomain.

I think that's inaccurate. DNS validation for foo.example.com would look up _acme-challenge.foo.example.com, which would be unaffected by the CNAME. That's why ACME chose to put the validation value on a prefix rather than the domain being validated; so it could work with CNAMEs.

@jsha actually it does matter because of how many cloud/enterprise hosts handle DNS record updates. Assume this validation is wanted:

  • example.com
  • *.example.com

Most clients (including certbot) will generate/present the auth challenges in serial – so an automatic system will set the first record, then the second.

Many cloud systems appear to have an internal cache that lasts from 60s to 300s (possibly tied to their minimum TTL, possibly not; possibly primed via a write-through-cache, possibly not; also possibly affected by internal caching systems propagating outwards). Setting a second value on these systems for the same key in a TXT record doesn't appear to write into the cache.

So in this flow:

  1. Generate challenge for example.com
  2. Set challenge for example.com;
  3. Generate challenge for *.example.com
  4. Set challenge for *.example.com;
  5. Validate the challenges

With at least 3 consumer DNS systems, Step 2 will cause that TXT record to somehow cache and delay Step 4 from being readable until it somehow expires.

The "fix"* it to patch Certbot and have the client sleep between steps 4&5 for a minimum time that accounts for:

  • The record set in Step 2 to expire
  • The record set in Step 4 to propagate

One vendor I tested against seemed to handle this in 90s for a 60s TTL; two other vendors with a 300s TTL couldn't handle this repeatably without a 360s TTL.

Using the default hooks (unpatched), one would need to sleep 90 or 360s after every record is set -- that translates to 12 minutes for a single domain+wildcard combo, and over 3hours for the max of 50 on a cert.

Allowing a *.example.com and example.com to use the same challenge when combined ina single CSR would make this process much faster - and therefore easier to test and deploy.

* I'm calling the above a "fix" in quotes, because the right solution is migrating to ACME-DNS. That is not an option for everyone though. (Thankfully it was for me, because I have nearly 50 domains and 12 of them are on a single cert)

Yep, I agree this is a big* problem, and I've written a patch to fix it in certbot-route53, so all records are put in place at once, and waited on together: Restore parallel waiting to Route53 plugin by jsha · Pull Request #5712 · certbot/certbot · GitHub. I think our goal here should be to get more Certbot plugins, and more ACME clients in general, to implement parallel waiting. That's already the norm for HTTP validation. My understanding is that a popular DNS updating library, lexicon, might be one of the reasons serial updating is so common, but AFAIK the Certbot team is working with the maintainers to improve the state of things.

*At least, for initial issuance, and for debugging problems. For background renewal, it's not important.

All zone modifications could indeed happen against the base domain’s zone. I’m not sure, though, if all resolvers would correctly handle the case where example.com has the TXT for _acme-challenge.foo.bar.example.com but bar.example.com is misconfigured.

The point, though, about the number of zone modifications remains: for a domain with many subdomains, the difference between a single zone modification for all ACME validations versus a separate modification for each and every FQDN under that domain is very significant.

Granted, doing the modifications in batch would alleviate most of the overhead.

To be precise here: when you say "bar.example.com is misconfigured," you mean "bar.example.com is a CNAME to a domain that does not resolve," right? Assuming so, all resolvers will correctly resolve a query for TXT _acme-challenge.foo.bar.example.com in that situation. The only resource record type on bar.example.com that would have an impact would be NS. That is, if there were a record NS bar.example.com nonexistent.example.net, then all lookups for subdomains of bar.example.com would fail, because they would be delegated to a nonexistent name server.

Sort of... the problem is more on certbot right now than lexicon. Lexicon is typically invoked via Certbot's manual plugin. The design of the manual plugin has --manual-auth-hook run in serial (once per domain); there is no hook capable of setting up 'all' domains at once, and there is no hook available after all domains are set up (which would allow for everything to pause). I suggested supporting this via populating the environment with information about the progress in the batch, but the certbot team was not receptive to it. (proposal - additional environment vars populated for manual plugin · Issue #5805 · certbot/certbot · GitHub , illustrated in integrating batch info to environment vars · jvanasco/certbot@8f005bc · GitHub )

In terms of Lexicon... I would love to see it support multiple updates at once, largely because two of the providers I use don't add/remove a record but replace the entire zones at once... so it would be much more efficient for me. However, I don't see that happening anytime soon. I'm fairly familiar with the library and maintainers (I rewrote 70% of one client, have been working on standardizing some other elements). A parallel capability would require a lot of changes/redesign to the library, and each 'provider'/api has been handled by a separate person with their own credentials to a commercial vendor. What should be a sprintable change, has been deemed long-term by the commercial setting.

Assuming so, all resolvers will correctly resolve a query for TXT _acme-challenge.foo.bar.example.com in that situation.

^^ That assumes that the resolver logic queries against the full name from the get-go … right? At least one resolver that I know of will first resolve example.com, then bar.example.com, then foo.bar.example.com, and then _acme-challenge.foo …. It does so by querying for NS records. That resolver would not get the intended result because it would fail at one of the “midpoints”. Admittedly, setups like that may not be widespread.