Full disclosure: I’m the guy who started the “Regarding CA requirements as to technical infrastructure utilized in automated domain validations, etc. (if any)” thread on moz.dev.security.policy. I am not related to the paper or presentation that jsha referenced, though I have started a dialogue with the lead author.
I am competent to speak to many of the vulnerabilities and possible mitigations in the general scope and nature discussed in said paper.
I believe that the most essential defense against this kind of attack is to have multiple vantage points, each vantage point attached at a single point of interconnection and each in a distinct physical geography. Diversity of internet transit wherever possible.
What I am not competent to speak on is the physical / virtual / network security architecture of CA infrastructure. What I’ll say from this point forward is the union of what I’ve read and some assumptions I’ve made. Please correct any misgivings I might have:
I presume that the primary Validation Agent is a distinct physical element which has limited privilege to communicate back and forth with the policy engine of the CA such that the policy engine can request certain validations be tested and that the validation agent can report back the results of that test.
I would assume that the primary validation agent is physically collocated with the other critical CA infrastructure and likely sets in a distinct firewall zone.
Presumably, having far flung validation agents can be really really cheap… If they can be implemented as a cheap VM running on commodity infrastructure without significant security requirements.
Is it possible to construe the overall test result such that the primary validation agent must say yes, and then (and only if the primary says yes), the far-flung secondary validation agents must form a critical forum whose only job is to register an objection to the validation and stop issuance? If you do so, would that permit you to effectively define these remote agents as minimalist VMs (not even a shell interface) that boots up, phones home, and starts taking jobs to test and returning results to be collated, all without having to provide specialized persistent environments and persistent security guarantees of these secondary validation agents? It seems to me that if it is at all possible to eliminate the special operations environments needs of the secondary validators that this becomes a lot more practical and economical to deploy.
I have some comments on others’ posts on this thread:
I believe that the domain registry and the domain registrars are of interest, but that this is primarily between the party holding the domain and their registrar. There appears to be good security, in general, at the registry level, and so the user’s choice and configuration of their account at the registrar lets the user choose an appropriate level of registration security for their domain.
DNSSEC and CAA together have great potential for eliminating the hijack vulnerability – at least as to the DNS challenge, but I think it is improbable that CAA alone will help.
It is far more likely that a party wishing to maliciously secure a certificate will hijack the authoritative DNS server addresses and elect a dns validation than HTTP or TLS-SNI validation. I make this assertion because so many websites today are hosted on CDN farms where I might not know which answer to the DNS query that my target CA will get as to which IP the web server is at. It’s more likely that I’ll have a smaller set of IP space to hijack to intervene at the
As a result, if CAA alone without DNSSEC is utilized to set issuance criteria, it’s kind of pointless as a defense of the hijack attack. The attacker’s responder on the hijacked DNS server IP will present a rosier CAA picture, if any at all. You could catch that with DNSSEC, but not without.
I’ve not yet read in detail the proposed mechanism for the route-age heuristic, but my mind immediately comes to several challenges that I hope their proposal addresses:
- You can’t use just any view of the global BGP routing table to reference for the age of the advertised prefix. This is because a good attacker will work to ensure that the hijacked prefix is only announced as close to the target CA’s network as possible. If the scope of the hijack can be contained, it will break far fewer things and greatly increase the odds that the hijack is never noticed.
- As a consequence of concern #1, a useful view would mean that the validation agent is able to access the live BGP view of either the CA’s routing infrastructure (do CAs even generally BGP peer?) or if a non-BGP routing environment within the CA, would require a gratuitous BGP session with the upstream CA at the same point of interconnection as the CA’s actual upstream internet link to the upstream. Is it practical to bring such a feed into a regulated environment like a CA and rely upon it for decisioning (at least, that is, pushing a decision to the negative) on issuance?
- Have you ever watched a full routes stream live? Route advertisements in some instances are quite ephemeral. Some service providers dynamically rebalance traffic during the course of the day through modifying some of their advertisements. As a result, other ISPs across the net suddenly see a different best-path route to the same IP, even though connectivity was effectively continuous. I’m not sure that enough work has been done to link stability of a given prefix advertisement in one view of the global table to any particular security posture.
I reiterate again that the three points I’ve just made are naive as to the specific technique mentioned in the paper, as I’ve not yet read that section in detail.
Thanks,
Matt