I found a bit weird behavior, testing some edge scenarios for my system and dns-01 challenges.
It is perfectly legal if DNS server returns many CAA records for the domain. ACME service should find the one matching the provider's policy. So, if I provide two records:
example.org CAA 0 issue "letsencrypt.org"
example.org CAA 0 issue "otherca.org"
Let's Encrypt should, and indeed it correctly recognizes the first CAA record as the one which allows the service to process with certificate generation. So technically this is an OR (alternative) logic in the scope of single DNS server.
Second scenario I tested is the one with two authoritative DNS servers.
DNS 1 provides this:
example.org CAA 0 issue "letsencrypt.org"
DNS 2 provides nothing (no CAA, no TXT records for _acme-challenge).
In this scenario ACME queries both DNS servers, finds that one of them provides correct records and issues the certificate. So it seems that OR logic works also with many DNS servers. So far, so good.
Then I tested third scenario, where DNS 1 provides this:
example.org CAA 0 issue "letsencrypt.org"
but DNS 2 provides this:
example.org CAA 0 issue "otherca.org"
And sadly, in this scenario ACME complaints about CAA record mismatch. So the OR logic does not really work for many DNS servers.
Why is it important?
Imagine building a multi-cluster environment of super-high available services.
There are many DNS servers in different locations. There are also many ACME clients renewing certificates. For security reasons, we don't want to share account's private key between those instances, so we decided to create an account on each ACME client, being able to issue certificates for our domains (yes, I'm aware of the API limits, all is good). Each ACME client is able to connect to DNS servers to instruct them to serve appropriate CAA and TXT records.
First thing is that, we don't want to be attacked by fulfilling ACME API limits. That's why our DNS servers return CAA 0 ;
response if there are no pending certificate orders. The correct CAA record (containing the appropriate account URI) is served only if our ACME client submitted the challenge to the DNS servers.
As I said, we are preparing super-highly-available cluster so we must assume that things might (and will) go wrong. Meaning that there might be network hickups, some DNS servers might be malfunctioning, etc... The result might be that only a subset of our DNS servers receives the chalenge requerst from our ACME clients. As a result, part of the DNS servers might serve correct CAA record like:
example.org CAA 128 issue "letsencrypt.org;accounturi=...;validationmethods=dns-01"
but others might serve this:
example.org CAA 0 issue ";"
The current implementation of Let's Encrypt returns error in such scenarios. Instead the ACME service should observe that one of the authoritative DNS servers provides valid CAA record and should continue processing the order.
Otherwise there are only two bad options available:
- All the DNS servers must be updated (which opens the gate for failures for many reasons)
- DNS servers cannot provide the default
CAA 0 ";"
response (which opens the gate for attackers).