Hi,
I have some certificates for internal machines which have begun failing DNS-01 challenges since the CAA strict checking/no SERVFAIL changes.
My setup is that the domains I'm requesting certs for are CNAMEs pointing to DNS records which are hosted on firewalled DNS authorities. This allows a public name to point to an internal IP without revealing the network addresses or structure of my company's LAN.
e.g.
- I want a cert for git.example.com
- git.example.com CNAME -> gitserver.internal.example.com in the public DNS
- internal.example.com has authoritative NS 10.0.0.53
Thus when LetsEncrypt follows the CNAME from git.example.com and asks the internal.example.com nameserver, it gets SERVFAIL since it cannot contact 10.0.0.53.
This appears to be the reason LE returns
acme: Error 400 - urn:acme:error:connection - DNS problem: SERVFAIL looking up CAA for git.example.com
Error Detail:
Validation for git.example.com:
Resolved to:Used:
Lego exit status: 1
However, per RFC6844 Section 4 (RFC 6844 - DNS Certification Authority Authorization (CAA) Resource Record), it looks to me like LE should be implementing tree climbing to the CNAME record's parent name, and should thus eventually check the CAA record for example.com, even if it cannot reach the authority for gitserver.internal.example.com.
I have attempted to catch tree climbing by both:
- Adding example.com CAA 0 issue ";" and
- Adding internal.example.com CAA 0 issue ";"
On the theory that tree climbing from git.example.com should hit the former, while tree climbing from gitserver.internal.example.com should hit the latter.
However, my DNS-01 attempts are still failing.
Here's the RFC section I think says this is wrong:
Let CAA(X) be the record set returned in response to performing a CAA
record query on the label X, P(X) be the DNS label immediately above
X in the DNS hierarchy, and A(X) be the target of a CNAME or DNAME
alias record specified at the label X.o If CAA(X) is not empty, R(X) = CAA (X), otherwise
o If A(X) is not null, and R(A(X)) is not empty, then R(X) =
R(A(X)), otherwiseo If X is not a top-level domain, then R(X) = R(P(X)), otherwise
o R(X) is empty.
Bullet (1) should fail, since we cannot have a CAA on the CNAMEd record.
Bullet (2) will SERVFAIL, meaning that the boolean A(X) is not null AND R(A(X)) is not empty
is undefined, meaning we should proceed with the OTHERWISE to
Bullet (3), which would cause a CAA lookup on the parent of X, in this case the parent of git.example.com, which is example.com, and should be retrievable.
Given the announcement of hard fails on SERVFAIL and the behavior I have observed, I believe the Let's Encrypt checker is likely incorrectly taking the SERVFAIL in the bullet (2) test as an overall permanent fail, while it seems to me is should simply pass on to the remainder of the rules.
Glancing at the IETF mailing lists on this subject, it looks like the tree climbing and CNAME rules in the RFC were intended to support non-publicly-accessible nameservers like I'm talking about.
My specific providers is Amazon Route53 for the both the internal and external zones. I'm using the Lego client from a Linux host (which, incidentally, on the client side can access the internal DNS authority).
Will it be possible to fix the behavior of the verification tool on LE's servers?