Firstly, the problem is NOT with NetRegistry.
Let’s take a sample domain (drumdigital.com.au).
As noted in: LetsEncrpyt & NetRegistry (Australia) - Resolved?
When you query DNS via TCP, (dig CAA drumdigital.com.au. @ns-1.ezyreg.com. +tcp), it works.
When you query DNS via UDP, (dig CAA drumdigital.com.au. @ns-1.ezyreg.com. +notcp), it fails.
There is something between the LetsEncrypt and NetRegistry data centres that cause the problem. Isolating what/where the issue is going to be difficult. In my testing, I have found that different data centres within Australia have issues as well. It sometimes works with UDP. But it ALWAYS works via TCP.
In fact for the Australia data centres where the query is actually reponded to, the data sent back is incorrect (An A record answer is returned rather than an empty response indicating no CAA record).
As noted here: DNS problem: query timed out looking up CAA (using Netregistry), another issue is that Unbound – used by the Boulder software – never falls back to using TCP.
There are a few ways to attack the problem:
- end-clients migrate the nameserver business elsewhere
- Have Boulder perform all CA related DNS queries over TCP.
- get the intermediate network operators to fix the root cause
For (2): Boulder could be configured to only perform DNS queries via TCP. RFC7766 suggests that all DNS servers must support this. In https://unbound.nlnetlabs.nl/pipermail/unbound-users/2017-April/004775.html, Paul Vixie, creator of DNS, thinks it is a good idea but does not believe it will work in practice.
If LetsEncrypt demanded TCP connectivity to DNS servers, it would have a significant first-mover advantage and likely change the Internet to require queries of DNS over TCP to function (which would be good).
For (3): I have done some debugging and raised issues with the NOC team at tpg.com.au (NetRegistry’s transit provider from what I can determine) and their own NOC.
HTH.
Anand