PowerDNS: Can't find why CAA servfails

cpu · July 19, 2017, 6:42pm

I’ve made a pass to split off the folks that piled onto this thread so we can help individually. The root cause differs case-to-case. To help avoid more pile-on’s I updated the title of this thread to mention PowerDNS since presently it seems the root cause in this case will be related to that DNS server.

jsha · July 20, 2017, 3:51am

I made a tool that makes it easier to make queries against a DNSSEC-validating Unbound instance and see the debug logs: https://unboundtest.com/. Hopefully it's helpful. @rickjanssen based on your comments about how you reproduced more reliably, I tried blocking ns1.zxcs.nl and ns2.zxcs.nl in iptables on that machine, and querying CAA pop.gwvanpelt.nl every 5 seconds for 10 minutes. I never saw one SERVFAIL unfortunately. Was that the domain you were able to reproduce with, or was there another?

unboundtest # iptables --list OUTPUT --line-numbers -v
Chain OUTPUT (policy ACCEPT 53 packets, 70499 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1       18  1302 REJECT     all  --  any    any     anywhere             ns1.zxcs.nl          reject-with icmp-port-unreachable
2     627K   53M REJECT     all  --  any    any     anywhere             ns2.zxcs.nl          reject-with icmp-port-unreachable

Also, one thing we noticed when talking with @weppos separately was that there appears to be a bug either in DNSimple's name server or potentially in Unbound specifically with the combination of DNSSEC-signed zones, DNS 0x20 (which we use), and empty responses. We found that DNSSEC-signed responses that were non-empty worked fine, and disabling DNS 0x20 on the test instance fixed the empty responses (note: we're not planning to disable DNS 0x20 in prod since that would reduce security).

I'm pretty sure you're not experiencing the exact same issue (for one thing, you are using different software), but there may be a similar confluence of confounding factors that includes caching. Do you find that all the domains that are having problems are DNSSEC-signed? Are you able to reproduce the same problem for TXT records? If you add CAA records to a domain that reproduces the problem, does the problem go away?

Is it possible to whitelist certain IP addresses from which the requests come? That will be 185.104.29.0/24

Unfortunately this isn't possible with our software.

rickjanssen · July 20, 2017, 4:07am

I’m still figuring out what causes this to happen.

Indeed, we figured that if we add a CAA record the problem is worked around, but we can’t add it for everyone. We plan on automatically adding the record when requesting a Let’sEncrypt certificate.

Only CAA has this, although I haven’t tested TXT, but A works.

This is weird, I am unable to reproduce the SERVFAIL responses too since now, but nothing changed.

jsha · July 20, 2017, 4:10am

A post was merged into an existing topic: Help diagnosing CAA failures ns1.cyso.nl

jsha · July 20, 2017, 4:10am

Whoops, posted on the wrong thread. Moving that post to the right thread.

jsha · July 20, 2017, 4:12am

The reason I suggest TXT is that for most domains it will be an empty response, while the response for A is non-empty. It seems like there are potentially issues specifically around empty responses.

rickjanssen · July 20, 2017, 4:15am

Will check on that, but for now, even CAA stopped sending SERVFAILs. Might be because of the low traffic at this moment.

Edit: it’s back, going to test some more after some sleep.

jsha · July 20, 2017, 4:43am

For what domain is it back? I don't see SERVFAILs for pop.gwvanpelt.nl right now.

rickjanssen · July 20, 2017, 4:48am

Try mail.bkbouw.nl @ns1.zxcs.nl ( 185.104.28.19 ), I've lowered the query cache so it should start to servfail almost instantly. ns3.zxcs.nl ( 178.62.208.8 ) has a different configuration right now.

jsha · July 20, 2017, 5:19am

Hm, I'm still not able to reproduce, even for this domain.

rickjanssen · July 20, 2017, 9:05am

Aren’t you using a different setup than before? I see this happening:

blocked ns2 ns3, forward to ns1

root@ubuntu:/home/rick# while true; do sleep 1; dig mail.bkbouw.nl CAA @127.0.0.1 | grep status; done

;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 43506
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5907
---- THE MOMENT I RESTART POWERDNS ----
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37054
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56267
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7804
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8234
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 47057
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 6974

jsha · July 20, 2017, 4:05pm

I still don't reproduce, even with this script. One thing to note: I don't have the ability to restart your PowerDNS instance (of course). Is it possible the issue only presents shortly after a restart?

Also, my current config rejects packets going to ns[23], and allows packets going to ns1. I didn't apply a forwarding rule because I figured it was unnecessary. Want to share with me your iptables? Here's mine:

# iptables --list OUTPUT -v --line-numbers
Chain OUTPUT (policy ACCEPT 39160 packets, 44M bytes)
num   pkts bytes target     prot opt in     out     source               destination
1      804 55864 REJECT     all  --  any    any     anywhere             ns3.zxcs.nl          reject-with icmp-port-unreachable
2     629K   53M REJECT     all  --  any    any     anywhere             ns2.zxcs.nl          reject-with icmp-port-unreachable

Also, it would probably be easier if instead you set up a domain (or subdomain) that had only one NS record, so neither of us would have to mess around with iptables.

Also, I got a reply on the unbound-users mailing list suggesting a possible area to look at: Issues with DNSSEC, use-caps-for-id, and empty responses. I assume your PowerDNS instance does online signing? Can you check whether it downcases queries before signing NSEC responses?

jsha · July 21, 2017, 8:17am

I reported an issue to PowerDNS, and they tell me:

When I check the version of PowerDNS you're currently running, it looks like 4.0.4. Have you upgraded since we began the discussion, or have you been running 4.0.4 all along?

$ dig +short version.bind chaos txt @ns1.zxcs.nl
"PowerDNS Authoritative Server 4.0.4 (built Jun 22 2017 20:14:47 by buildbot@c1b965951e5b)"

rickjanssen · July 21, 2017, 12:51pm

Hi @jsha

Yes, I’ve upgraded yesterday evening to pdns 4.0.4. I’m currently on a holiday so I’m sory for my slow answers.

The problem looks solved! Thanks for giving so much time and attention to this issue.

system · August 20, 2017, 12:51pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help diagnosing CAA SERVFAIL Help	9	4455	August 19, 2017
Cant renew cert: DNS problem: SERVFAIL looking up CAA Help	20	6204	October 6, 2017
DNS problem: SERVFAIL looking up CAA Help	9	2087	October 14, 2020
DNSimple CAA SERVFAIL Help	6	2859	November 1, 2017
DNS problem: SERVFAIL looking up CAA Help	4	2805	January 20, 2019

PowerDNS: Can't find why CAA servfails

Related topics