How to reproduce CAA SERVFAIL? Works for me, doesn't for LE staging or prod

Update: Solution

Getting a SERVFAIL on LE does not mean that you will get a SERVFAIL through dig or unbound or other tools. However, here are some things to try:

  1. Use dig -t type257 @ns1.your-nameserver.com yOuR-wEbSiTe.CoM
  2. Check to make sure that your QUESTION section matches the wacky case (aka 0x20 bit casing). Many servers will naturally have matching ANSWER or AUTHORITY sections as well
  3. Use dig +dnssec -t type257 @ns1.your-nameserver.com yOuR-wEbSiTe.CoM
  4. Make sure that you’re not getting an error message that way either
  5. Use https://unboundtest.com and check for “wrong” and “fallback” in the logs

Most likely you will find that your DNS server is not responding correctly in one or more of those ways.

Original

Here’s the error message I’m getting:

DNS problem: SERVFAIL looking up CAA for tunnel.daplie.com

Here’s the command I run to try to reproduce the problem:

dig -t type257 @ns1.redirect-www.org tuNNel.daPLie.com

; <<>> DiG 9.8.3-P1 <<>> -t type257 @ns1.redirect-www.org tuNNel.daPLie.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45917
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;tunnel.daplie.com.		IN	TYPE257

;; AUTHORITY SECTION:
daplie.com.		1	IN	SOA	ns1.redirect-www.org. hostmaster.daplie.com. 2017020100 10800 3600 1209600 1800

;; Query time: 34 msec
;; SERVER: 192.241.238.7#53(192.241.238.7)
;; WHEN: Mon Oct  2 13:55:26 2017
;; MSG SIZE  rcvd: 102

I did a sanity check against yahoo.com and I see this:

dig -t type257 @ns1.yahoo.com yAHOo.com

; <<>> DiG 9.8.3-P1 <<>> -t type257 @ns1.yahoo.com yAHOo.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29455
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;yAHOo.com.			IN	TYPE257

;; AUTHORITY SECTION:
yahoo.com.		600	IN	SOA	ns1.yahoo.com. hostmaster.yahoo-inc.com. 2017100220 3600 300 1814400 600

;; Query time: 17 msec
;; SERVER: 68.180.131.16#53(68.180.131.16)
;; WHEN: Mon Oct  2 13:56:10 2017
;; MSG SIZE  rcvd: 94

The nameservers that we’re using were built by us and they are still in the process of refinement (they work, but when something doesn’t work we go fix it).

I’d like to be able too see why LE doesn’t like our responses so that I can fix them (it’s probably our fault), but I haven’t been able to produce a query that fails in the way that LE is reporting the failure. Also, since yahoo’s nameserver responds the same as ours, it would seem to me that we’re not “doing it wrong” (unless they’re also doing it wrong, which I doubt).

Have you tried with https://unboundtest.com/ ? You could run your own Unbound instance with a similar config as well.

One thing to note is that you aren't testing with DNSSEC validation which is often a source of SERVFAILs, especially in conjunction with CAA. For instance our CAA docs page's section on SERVFAIL errors mentions one PowerDNS bug that manifested with DNSSEC and CAA records.

https://unboundtest.com/m/CAA/tunnel.daplie.com/EAZVT466

It seems to work, sort of.

Notice that the authoritative nameservers don’t properly support 0x20 (capitalization) randomization. In the dig you pasted, notice that the server replied, but the reply packet was all lowercase.

Unbound will retry a few times and then switch to all lowercase queries, and ultimately succeed, but maybe it hit a timeout and gave up before that could happen. Or maybe Let’s Encrypt’s no-caching configuration interferes with the fallback logic.

1 Like

I tried using the 0x20 casing here and it seemed to like it just fine:

https://unboundtest.com/m/CAA/tuNNel.daPLie.COM/FPHIDBFV

It’s good to know about uboundtest.com. I’ll use that more in the future.

Unfortunately, I’m still not reproducing a SERVFAIL for the CAA. I can, of course, update our nameserver to respond in the authority section with the correct casing - however, since we haven’t implemented compressing with compression pointers yet, I’d be curious if it even counts anyway. The string in the authority section is a different string.

It doesn’t like it just fine. Regardless of the capitalization used in your query, Unbound randomizes it when querying the authoritative servers (when configured to). This fails because the authoritative servers give a lowercase response, Unbound tries each one in order, gives up, and then tries it using what the client initially specified.

You can see it in the dig in your first post, or the “info: wrong 0x20-ID in reply qname” and “info: Capsforid: starting fallback” and “info: Capsforid: reply is equal. go to next fallback” messages in the unusually lengthy unboundtest.com log output.

It shouldn’t be a fatal issue, but i’m wondering if Let’s Encrypt has a short timeout, and it’s bailing out before finishing.

Or maybe there’s another issue.

1 Like

+dnssec

Well, I don’t see a SERVFAIL with +dnssec, but I do see that the casing is also off in the question section… which I didn’t notice before. That’s a cause for concern. However, before the CAA record checking this wasn’t a problem (though perhaps that’s because the authority section wasn’t being returned before).

dig +dnssec -t type257 @ns1.redirect-www.org tuNNel.daPLie.com
; <<>> DiG 9.8.3-P1 <<>> +dnssec -t type257 @ns1.redirect-www.org tuNNel.daPLie.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13600
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;tunnel.daplie.com.		IN	TYPE257

;; AUTHORITY SECTION:
daplie.com.		1	IN	SOA	ns1.redirect-www.org. hostmaster.daplie.com. 2017020100 10800 3600 1209600 1800

;; Query time: 29 msec
;; SERVER: 192.241.238.7#53(192.241.238.7)
;; WHEN: Mon Oct  2 14:36:58 2017
;; MSG SIZE  rcvd: 113

For reference

I see that google does support 0x20 and that yahoo does not:

dig -t A @ns1.google.com gOOglE.com

; <<>> DiG 9.8.3-P1 <<>> -t A @ns1.google.com gOOglE.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21865
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;gOOglE.com.			IN	A

;; ANSWER SECTION:
gOOglE.com.		300	IN	A	172.217.6.46

;; Query time: 49 msec
;; SERVER: 216.239.32.10#53(216.239.32.10)
;; WHEN: Mon Oct  2 14:40:04 2017
;; MSG SIZE  rcvd: 44
dig -t A @ns1.yahoo.com yAHOo.com

; <<>> DiG 9.8.3-P1 <<>> -t A @ns1.yahoo.com yAHOo.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29318
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 5, ADDITIONAL: 8
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;yAHOo.com.			IN	A

;; ANSWER SECTION:
yahoo.com.		1800	IN	A	98.138.253.109
yahoo.com.		1800	IN	A	98.139.180.149
yahoo.com.		1800	IN	A	206.190.36.45

;; AUTHORITY SECTION:
yahoo.com.		172800	IN	NS	ns5.yahoo.com.
yahoo.com.		172800	IN	NS	ns4.yahoo.com.
yahoo.com.		172800	IN	NS	ns2.yahoo.com.
yahoo.com.		172800	IN	NS	ns1.yahoo.com.
yahoo.com.		172800	IN	NS	ns3.yahoo.com.

;; ADDITIONAL SECTION:
ns1.yahoo.com.		1209600	IN	A	68.180.131.16
ns2.yahoo.com.		1209600	IN	A	68.142.255.16
ns3.yahoo.com.		1209600	IN	A	203.84.221.53
ns4.yahoo.com.		1209600	IN	A	98.138.11.157
ns5.yahoo.com.		1209600	IN	A	119.160.247.124
ns1.yahoo.com.		86400	IN	AAAA	2001:4998:130::1001
ns2.yahoo.com.		86400	IN	AAAA	2001:4998:140::1002
ns3.yahoo.com.		86400	IN	AAAA	2406:8600:b8:fe03::1003

;; Query time: 17 msec
;; SERVER: 68.180.131.16#53(68.180.131.16)
;; WHEN: Mon Oct  2 14:40:10 2017
;; MSG SIZE  rcvd: 335

Thanks. I was grepping for “SERVFAIL” in the logs and didn’t see it. I’ll look into this further.

I think only the capitalization in the question section matters, FWIW. I don't think the answer/authority/additional sections have to match.

Unbound doesn't object to Yahoo's responses.

1 Like

Many thanks @mnordhoff. After making the change to the question section I’m not getting SERVFAIL anymore. Certs renewed, FTW.

Do you have a URL or handle on a service by which I can donate to your caffeinated / brewed beverage fund?

1 Like

Great!

:blush: Thank you! I'm afraid i don't have something set up, and i have too many beverages already. If you like, you could send something to Let's Encrypt, though. Donate - Let's Encrypt

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.