DNS problem: SERVFAIL looking up CAA

I still see issues resolving CAA type queries with your authoritative nameservers. E.g: https://unboundtest.com/m/CAA/shifudao.com/AF3IGNSE

DNSViz also reports errors with this zone: shifudao.com | DNSViz

This class of problem needs to be resolved by your authoritative DNS provider, dns.com. Please see Certificate Authority Authorization (CAA) - Let's Encrypt for more information.

If your DNS provider isn't able to handle CAA queries properly you will need to change to a different authoritative DNS provider to use Let's Encrypt.

Thanks. I’ve already asked my DNS provider about this issue, and they said that dns.com supports CAA record, it’s not a fault.

And I’ve tried unbondtest.com, and it’s OK: https://unboundtest.com/m/CAA/shifudao.com/SDGVLRQG

I also used the ssl labs for testing, and the DNS CAA shows yes: https://www.ssllabs.com/ssltest/analyze.html?d=git.shifudao.com&hideResults=on

Here’s a more verbose log (more verbose than unboundtest anyway) SERVFAIL running Unbound 1.7.3: https://id-rsa.pub/servfail

The cause appears to be capsforid mismatches - but I’m not sure for which domain (maybe for the nameservers themselves?).

Is the problem that the nameservers do not support mixed case queries?

$ dig +noall +answer @ns1.google.com gOoGle.com
gOoGle.com.             300     IN      A       216.58.199.46

vs

$ dig +noall +answer @m1.dns.com nS2.DNs.cOm
ns2.dns.com.            3600    IN      A       218.98.111.203
ns2.dns.com.            3600    IN      A       218.66.171.11
ns2.dns.com.            3600    IN      A       121.14.154.235

Yes, I think that’s the problem. Specific to the CAA query, it just times out, presumably because the nameserver thinks it is not authoritative for that domain at all:

$ dig @ns2.dns.com giT.shiFudao.com caa

; <<>> DiG 9.11.3-1ubuntu1.1-Ubuntu <<>> @ns2.dns.com giT.shiFudao.com caa
; (3 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

I removed the git.shifudao.com CAA record and add the shifudao.com CAA record.

OK, but it’s the same problem - the server drops the query because it doesn’t understand mixed-case queries:

$ dig @ns2.dns.com shiFudao.com caa

; <<>> DiG 9.11.3-1ubuntu1.1-Ubuntu <<>> @ns2.dns.com shiFudao.com caa
; (3 servers found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Let’s Encrypt uses a resolver that implements mixed-case queries, which is a form of forgery resistance.

1 Like

dig shows ns3.dns.com and ns4.dns.com in the AUTHORITY SECTION.

dig @ns3.dns.com shiFudao.com caa   # works
dig @ns4.dns.com shiFudao.com caa   # also works

Ah, that’s my mistake. I don’t think that’s the problem then - I’m not even sure what the pattern is with the queries that cause fails - some queries fail, some don’t.

Normally dns.com’s nameservers respond with the query name in lowercase. That fails Unbound’s capitalization test, causing it to go into fallback mode.

I can’t figure out why, but fallback mode usually fails on dns.com’s negative responses. It says “Capsforid fallback: getting different replies, failed” and returns SERVFAIL.

Maybe it considers “any reply” and “timeout” to be “different replies”?

In the log I posted, I noticed this:

[1533008322] unbound[16166:0] info: response for git.shifudao.com. CAA IN
[1533008322] unbound[16166:0] info: reply from <shifudao.com.> 218.98.111.174#53
[1533008322] unbound[16166:0] info: incoming scrubbed packet: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa rd ; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0 
;; QUESTION SECTION:
git.shifudao.com.	IN	A

;; ANSWER SECTION:
git.shifudao.com.	600	IN	A	42.121.131.6

;; AUTHORITY SECTION:
shifudao.com.	3600	IN	NS	ns3.dns.com.
shifudao.com.	3600	IN	NS	ns4.dns.com.

(an A response, what?)

and later:

[1533008323] unbound[16166:0] info: response for git.shifudao.com. CAA IN
[1533008323] unbound[16166:0] info: reply from <shifudao.com.> 218.66.171.173#53
[1533008323] unbound[16166:0] info: incoming scrubbed packet: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 
;; QUESTION SECTION:
git.shifudao.com.	IN	CAA

;; ANSWER SECTION:

;; AUTHORITY SECTION:
shifudao.com.	600	IN	SOA	ns3.dns.com. admin.dns.com. 1533005298 28800 3600 1209600 900

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 84

which is what we would expect.

Could this be the mismatch? (But it seems Unbound did send an A query for some reason ...)

QNAME minimisation. It sends at least 1 A query for each name. (com, shifudao.com, etc.)

Perhaps, but I think Unbound is still messing up and comparing the A response to the CAA response.

I added some printf debugging to Unbound inside the capsforid fallback routines. Compare a functional capsforid fallback:

Good: CAA compared against CAA

[1533016604] libunbound[24381:0] info: response response->rep: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa rd ; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
shifudao.com.	IN	CAA

;; ANSWER SECTION:
shifudao.com.	600	IN	CAA	0 issue "digicert.com"
shifudao.com.	600	IN	CAA	0 issue "1738.unknown-ca.caarecord.org"
shifudao.com.	600	IN	CAA	0 issue "letsencrypt.org"

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 143

[1533016604] libunbound[24381:0] info: response caps_reply: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa rd ; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
shifudao.com.	IN	CAA

;; ANSWER SECTION:
shifudao.com.	600	IN	CAA	0 issue "digicert.com"
shifudao.com.	600	IN	CAA	0 issue "1738.unknown-ca.caarecord.org"
shifudao.com.	600	IN	CAA	0 issue "letsencrypt.org"

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 143

Bad: CAA compared against A ...

[1533016611] libunbound[24397:0] info: response response->rep: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 
;; QUESTION SECTION:
git.shifudao.com.	IN	CAA

;; ANSWER SECTION:

;; AUTHORITY SECTION:
shifudao.com.	600	IN	SOA	ns3.dns.com. admin.dns.com. 1533005298 28800 3600 1209600 900

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 84

[1533016611] libunbound[24397:0] info: response caps_reply: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: qr aa rd ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
git.shifudao.com.	IN	CAA

;; ANSWER SECTION:
git.shifudao.com.	600	IN	A	42.121.131.6

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
;; MSG SIZE  rcvd: 50

[1533016611] libunbound[24397:0] info: flags 34048 vs 34048
[1533016611] libunbound[24397:0] info: qdcount 1 vs 1
[1533016611] libunbound[24397:0] info: security 0 vs 0
[1533016611] libunbound[24397:0] info: an_numrrsets 0 vs 1
[1533016611] libunbound[24397:0] info: ns_numrrsets 1 vs 0
[1533016611] libunbound[24397:0] info: rrset_count 1 vs 1
[1533016611] libunbound[24397:0] info: Capsforid fallback: getting different replies, failed
Host git.shifudao.com not found: 2(SERVFAIL). (error)

The logging that is printed in the bad version (CAA question, A answer) I think is just a result of broken internal state in Unbound, if you run tcpdump, no such response is ever sent:

12	1.392242	121.12.104.110	172.104.24.29	DNS	136	Standard query response 0xb356 CAA git.shifudao.com SOA ns3.dns.com

I think this is an Unbound bug but I'm not sure what triggers it.

1 Like

Hi @cpu

I try to apply a TrustAsia cert and it’s successful. TrustAsia also check CAA record. That is to say my CAA record is no problem. Could you please give me a help and help me to find the reason?

ssl lab testing is OK: https://www.ssllabs.com/ssltest/analyze.html?d=git.shifudao.com

And unbound test seems also fine: https://unboundtest.com/m/CAA/git.shifudao.com/O34SFSIV

Thanks.

TrustAsia may be checking without DNSSEC or 0x20 case randomization or many other DNS features. It isn't sufficient to say that because TrustAsia was able to issue the problem isn't your authoritative DNS servers.

I have reason to believe UnboundTest is inaccurate but can not access that machine to verify or fix it.

We've filed a ticket with the operations team to disable QNAME minimization. It was enabled recently as a new default in Unbound.

2 Likes

I find the staging environment seems fixed my issue, I can use staging environment to apply a new cert or renew an old one:

certbot-auto certonly --staging ....

certbot-auto renew --staging --break-my-certs ...

I don’t know about when you posted, but at the moment, it seems production has QNAME minimization on and staging has it off.

Understand that staging issues certificates that aren’t trusted by browsers (or other clients). It’s for testing certificate issuance, not for real usage. That’s why Certbot has the alarmingly named “--break-my-certs” option.

Edit: Also, you can use “certbot renew --dry-run” to test trying to renew your certificates with the staging environment. (It issues certificates but doesn’t save them.)

Yes I know. I just find the staging environment is working, and yesterday both production and staging environment cannot apply cert.

And if staging is working, maybe production will also be working in the future.

Production is scheduled to have qname minimization disabled today.

2 Likes

Qname minimization has been disabled in production.

1 Like

Good news. Now I can apply and renew certs successfully. :slight_smile:

2 Likes