We’re also seeing many of our customer domains that have Netregistry (or one of their resellers) failing to DV over DNS CAA query timeouts via UDP. Yes, the same query does work and does not timeout when done via TCP.
Something appears to have recently changed at LE, since the domains in my case did initially pass DV and had their respective certs issued.
Here is what I know about my cases so far…
- All domains have their authoritative DNS provided by Australian based Netregistry or one of their resellers.
- All are (or were) on active LE issued certs and all passed DV in the past.
- None changed their DNS provider since initially passing DV
- The CAA related failures I’m observing are all for DV renewals.
- None of the domains actually have any CAA records present.
- The CAA queries all return successfully when done via TCP; they all fail (with timeout) when done via UDP.
- Support calls to Netregistry reseller, TPPWholesale acknowledged that there may be an issue with the BIND version they run and CAA queries on UDP.
Here’s an example (using the domain posted by cbertozz at the start of this thread) that shows the CAA UDP vs TCP check…
This CAA check via TCP works
dig CAA guidedogswa.com.au. @ns1.netregistry.net. +tcp
; <<>> DiG 9.10.2 <<>> CAA guidedogswa.com.au. @ns1.netregistry.net. +tcp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38281
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;guidedogswa.com.au. IN CAA
;; AUTHORITY SECTION:
guidedogswa.com.au. 3600 IN SOA ns1.netregistry.net. dmain.netregistry.net. 2017030818 86400 7200 3600000 172800
;; Query time: 320 msec
;; SERVER: 203.55.143.10#53(203.55.143.10)
;; WHEN: Wed Apr 26 13:42:10 Eastern Daylight Time 2017
;; MSG SIZE rcvd: 97
This CAA check via UDP fails with timeout
dig CAA guidedogswa.com.au. @ns1.netregistry.net. +notcp
; <<>> DiG 9.10.2 <<>> CAA guidedogswa.com.au. @ns1.netregistry.net. +notcp
;; global options: +cmd
;; connection timed out; no servers could be reached
Here are some specific tactical questions to LE regarding this issue…
- Why did LE’s CAA check succeed in the past?
- Why is LE only doing the “critical” CAA query via UDP and not via TCP too?
- What has changed in LE’s systems recently that may be causing new CAA failures?
- It should be noted that LE had significant issues with DNS CAA queries via UDP when they first went GA (Dec 2015). Is the case we’re seeing now a regression?
And the strategic ask…
I’d also like to ask LE to improve the robustness of their CAA check by including a check via TCP as well.
If LE is going to absolutely rely on CAA checks, then the onus should be on them and their systems to ensure they are absolutely correct and they have exhausted all possible ways for checking. Their present implementation for this appears to be flawed in this regard. That is, they can get the response for CAA if they tried it via TCP.
It should be noted that I’m not necessarily asking to relax the CAB’s CAA requirements; though that would also solve this.