Thank you for your quick reply.
@jsha, regarding your comments:
Unfortunately, it’s not straightforward to distinguish a SERVFAIL due to an unavailable nameserver from a SERVFAIL due to tampered DNS, which is why we implement strict failing.
I understand that SERVFAIL has a security implication, but I believe that the tree climbing portion of the RFC was specifically designed to address this SERVFAIL-from-RR due to network connectivity limitations.
I hate to reference a non-authoritative source, but looking at the IEFT lists, the first result is a message from one of the RFC’s authors clearly saying that the tree climbing behavior was intended to support exactly the sort of security controls I have implemented by keeping my authorities off the public Internet:
The reason the tree climbing is necessary is that MANY DNS host and service
names are not visible on the public DNS. So most of the time, a CA has no
way to validate the records for secrethost.example.com, the CAA record has
to be at example.com.
In other words, the four-bullet-point recursive algorithm I copied from the RFC in my initial post was specifically intended to mean if SERVFAIL-from-RR, then proceed to tree climbing (the P(X) portion), not short-circuit fail the entire algorithm.
Separately, I would point out that the failure to lookup this record originates from the authority of my CNAME’s target, and the RFC is very clear that the tree-climbing should proceed off of the parent of the CNAME itself (i.e. NOT the target’s parent). This is very sensible, since the cert is being issued for the CNAME, and thus for a name under the bailiwick of the CNAME’s parent, not the target’s parent.
Thus I would argue that a SERVFAIL on the CNAME target’s resolution should certainly not preclude a more positive result from the CNAME’s parent itself, and I believe this is implicit in the RFC’s ruleset.
Supporting this interpretation, note that I could easily obtain an LE cert for my CNAME if I were to temporarily repoint it at a different, public name (not a practical solution, but part of the security model), then repoint it back to the secret name later on. Thus there is no real security advantage in ignoring the security policy of the CNAME’s bailiwick in favor of an equivocal response from the target’s authority.
I believe the best way to fix this situation is for your externally-visible nameserver to not return an internal-only delegation when queried from external hosts.
Split-horizoning, while a common technique, is effectively a violation of the domain name system, as now we have one name with two “authoritative” resolutions. Aside from being ugly and inelegant, it can practically lead to caching issues (as we have two authoritative answers possible depending on connection status) and other heisenbugs.
Your suggestion of a pure public NXDOMAIN would help limit the cache issue to 300 seconds (and is likely what I’ll do on an emergency basis, though I’d point out it’s more prudent and practical to split-horizon the private namespace with public NXDOMAINs, and not poison the public namespace), but it’s still a non-DNS hack.
The reason for pointing publicly to a “hidden” authority is to accomplish the security goal of limiting network visibility while remaining within standard DNS rules, and avoiding split-horizon DNS entirely.
This is an intended use of the DNS, and as such is, per my reading, and intended and supported use case of the CAA RFC’s checking rules.
Is this the right forum for discussing the security design of the ACME rule checker?
I think this particular issue is especially relevant to the DNS-01’s checker design, since the primary motivation for the DNS-01 checker was to support corporate networks with non-publicly accessible web servers. This is exactly the circumstance where keeping the secret HTTP server’s address itself secret would be both prudent and expected.
I would love for the checker to support these private names within the DNS spec as intended, rather than mandating a broken-DNS configuration by requiring split-horizoning.
Thanks for any pointers on working with the right groups to sort this out.