Legacy CAA Implementation

On Thursday we enabled an implementation of the legacy form of the CAA (RFC6844) spec. Previously, Let’s Encrypt implemented an amended and simplified form of RFC6844 as described in erratum 5065. We’ve been working on getting the erratum officially adopted at the CA/Browser Forum, and there is general consensus at both IETF and the CA/Browser Forum that the amended version is ideal. Unfortunately, the details haven’t yet been formally voted in, so we have implemented the older version temporarily while we wait for the vote to go through.

The difference is the handling of CNAMEs. Under erratum 5065, to check CAA for www.example.com, the CA looks up CAA for www.example.com, then example.com, then com (this is called tree-climbing). If any CNAMEs are encountered along the way, the CA’s recursive resolver automatically resolves them according to RFC1034, and the CA gets the CAA record at the end of the CNAME chain, if there is one.

Under legacy CAA, the CA is required to additionally climb the DNS tree on each CNAME record it receives along the way, and check CAA for each of those. For instance, if www.example.com is a CNAME to hosting.customer.example.net, the CAA must additionally check CAA for customer.example.net, example.net, and net. This may introduce new hostnames into your CAA path that were not there before. If any of those hostnames fails CAA lookup, issuance will be blocked.

There’s another issue with legacy CAA: Mixed CNAME/tree-climbing loops. Normally loops in CNAME records are handled automatically by recursive resolvers, and result in lookup failures. As a result, CNAME loops in the wild are rare. However, the tree-climbing behavior introduces a new potential loop. Say, for example, blog.example.com is a CNAME to www.blog.example.com. According to a strict interpretation of RFC6844, the CA is required to check CAA for blog.example.com, then www.blog.example.com, then blog.example.com, then blog.example.com, and so on forever. We’ve addressed this in our code by setting a limit on how deep we will chase such CNAMEs. In the interest of security and correctness, we fail closed in such a scenario and prevent issuance. RFC6844 does not specify how to handle such loops; under erratum 5065 there is no possibility for loops. Unfortunately, such mixed CNAME/tree-climbing loops are a very common and legitimate DNS setup in the wild, so this blocks issuance for some domains.

We’ve already gotten several reports from users that this is causing breakage:

We understand this is causing issuance problems for many people, and we’re going to be continuing to push hard to find a solution soon that allows continued issuance and renewal for affected domains.

One workaround that may work for if your DNS provider fully supports setting CAA records: As described in our CAA documentation, CAA processing terminates early if any CAA record is found. Setting a CAA record for your domain that explicitly allows issuance by Let’s Encrypt can help avoid these problems. This is of course not an ideal solution: not all DNS software supports setting CAA records, and not everyone has direct control of their DNS.


Update: Since failing closed on mixed CNAME/tree-climbing loops is so disruptive to so many subscribers, we’ve decided on further discussion to roll out a hotfix to detect such loops. Despite RFC6844 not addressing loops, our judgement is that detecting and ignoring such loops does not create any risk of missing a CAA record that would contradict issuance.

We rolled that change out to production on September 16 2017, 14:35 UTC. We also included a change increasing the maximum number of steps we would take when doing CNAME tree-climbing to 50.

Note that there is still a remaining category of domains that will have problems: Those for which a parent domain of one of their CNAME targets fails in response to CAA.


Update: Let’s Encrypt has now switched back to the erratum 5065 algorithm for CAA, the algorithm we’ve used since launch with the exception of the past two weeks.

Details: The CA/Browser Forum passed ballot 214 yesterday, switching to CAA erratum 5065. The ballot does not take effect until a standard 30-day Intellectual Property Review period has passed. However, Mozilla, Google, Microsoft, and Apple have each indicated that they consider it acceptable for CAs to use either the RFC 6844 algorithm or the erratum 5065 algorithm until then, after which point only erratum 5065 will be acceptable.