DNS problem: SERVFAIL


#1

Hello,

My domain is: pie3aq.pl

I ran this command: certbot certonly -a manual -d pie3aq.pl -d mail.pie3aq.pl --preferred-challenges dns --staging

because i want to generate certificates for both root domain and .mail subdomain. I choose dns challenge because i dont run webserver on the machine on which im issuing the certificate (on mail.pie3aq.pl). On my dns servers i run bind, and as you can see CAA is configured and _acme-challenge is also present in the zone. However, each time i run the command…

i receive this output: DNS problem: SERVFAIL looking up CAA for mail.pie3aq.pl

or sometimes this: DNS problem: SERVFAIL looking up TXT for
_acme-challenge.mail.pie3aq.pl

While both TXT records and CAA records are configured.

Do you guys have any idea what can be done wrong? How to fix this?

The operating system my web server runs on is (include version): CentOS 7

My hosting provider, if applicable, is: OVH

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): certbot 0.29.1

Thanks in advance!


Awesome investigation and bug fix from _az
#2

It looks like you were able to resolve this, is that right?


#3

I was not. The issue still persists and im still unable to generate the certificate. I do have one SSL certificate active for pie3aq.pl and www.pie3aq.pl on one of my servers, but i also have another one, that is supposed to serve as mail server. mail.pie3aq.pl resolves to its ip. On this server i choosed dns challenge, because i need certificates for both pie3aq.pl and mail.pie3aq.pl - since the pie3aq.pl resolves to another server IP, dns challenge is the only option. However im still receiving those kind of errors, even thought txt records are configured properly each time and i do have CAA entries (as you can check by yourself)

Do you have any ideas what can be the issue here? Why im still unable to generate the certificate?

Thanks


#4

Edit: nevermind, the CAA check comes after a successful TXT record check. I’ll try dig a little more.


#5

Can I ask, is this affecting only the --staging environment, or are you also unable to produce live certificates?


#6

I think this might be related to Widespread SERVFAIL problem related to DNS 0x20 / https://lists.dns-oarc.net/pipermail/dns-operations/2019-January/018359.html .

Specifically, the h-dns.pl nameserver is affected within the .pl ccTLD

I was finally able to reproduce the SERVFAIL issue with Unbound 1.9.0rc1 against your domain (log attached).

unbound-pl-fail.txt (41.5 KB)

I am unsure if you can actually do anything about it though, @mnordhoff might be able to confirm.


#7

Nice work.

I tried it in unboundtest a couple times. It did a capsforid fallback, of course, but it was successful.

Now I got something. Doing UDP queries for i-dns.pl AAAA against all the nameservers:

  • Like before, 2001:7f9:c::53 (b-dns.pl) times out.

  • CommunityDNS’s h-dns.pl returns an invalid response with no NSEC3 or RRSIG NSEC3 records.

That’s invalid from a DNSSEC protocol perspective and it’s a mismatch so it probably breaks capsforid fallback.

h-dns.pl has a bona fide bug this time – not supporting capsforid is problematic for Let’s Encrypt but not illegal – and it’s resulting in SERVFAILs as a side effect.

Edit:

So the moral of the story, again, is to retry until you get lucky and it doesn’t query h-dns.pl.

Edit again:

Someone prompted me to check, h-dns.pl’s response isn’t invalid, it’s just truncated.

So it’s not a bug, but it might fail capsforid matching anyway?

Edit again again:

Incidentally, h-dns.pl's untruncated response really is effectively identical, but 517 bytes long. Dunno why. Maybe less compression than the other implementations? Does Unbound care about compression differences when comparing responses?

$ digdr @h-dns.pl i-dns.pl aaaa

; <<>> DiG 9.13.5-1+ubuntu16.04.1+deb.sury.org+2-Ubuntu <<>> +dnssec +norecurse @h-dns.pl i-dns.pl aaaa
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39762
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;i-dns.pl.                      IN      AAAA

;; AUTHORITY SECTION:
pl.                     3600    IN      SOA     a-dns.pl. dnsmaster.nask.pl. 1548860463 900 300 2592000 3600
pl.                     3600    IN      RRSIG   SOA 8 1 86400 20190301221112 20190130211112 54420 pl. x/yz2q+tRo0R2YlhoSZjunYeCfungorRB41JBsslJH7sKvXkOaFUU78i rxVkR5yZaJUOD/AD4X0QYLV7wO4gzyvkAWX14gqiK+mbi8+s2z6eVCeF l/TvDXio6TM1HrnTz/FkhAOCJH+bDx2Jld/YqZCmEfXiYB4e9eaaVU8M Tb0=
8qte4rd30vv26oki620tjecbumd78nlo.pl. 3600 IN NSEC3 1 1 12 021C336F6BC1552C481D 8QTF8T4FKVDDVDSJH9D3PBTBV890F42D A RRSIG
8qte4rd30vv26oki620tjecbumd78nlo.pl. 3600 IN RRSIG NSEC3 8 2 3600 20190228120000 20190129120000 54420 pl. oUGQayHEUnSCL45GlouEUCmPQbzsik5RBA8NfewDtqOjElB9b+43eWjD ub6CY+8qYLaT5+egbqKu6zBKmN0sY4/aBUIP+baT5k7GpA7ir0XGLZDH GHcbvWX6wD9eNVZ0rnTPBz013rA6FxTJF6Wt8+g7ywxqCO8ZQlkXs6OT e14=

;; Query time: 118 msec
;; SERVER: 2001:678:4::2#53(2001:678:4::2)
;; WHEN: Wed Jan 30 22:15:02 UTC 2019
;; MSG SIZE  rcvd: 517

#8

I added a load of logging to Unbound and the caps failure comes from query_dname_compare.

[1548887931] libunbound[20345:0] info: reply from <pl.> 2001:678:4::2#53
[1548887931] libunbound[20345:0] info: flags: 32768 vs 32768
[1548887931] libunbound[20345:0] info: qdcount: 1 vs 1
[1548887931] libunbound[20345:0] info: security: 0 vs 0
[1548887931] libunbound[20345:0] info: an_numrrsets: 0 vs 0
[1548887931] libunbound[20345:0] info: ns_numrrsets: 3 vs 3
[1548887931] libunbound[20345:0] info: ar_numrrsets: 2 vs 2
[1548887931] libunbound[20345:0] info: rrset_count: 5 vs 5
[1548887931] libunbound[20345:0] info: rrset 3 not equal
[1548887931] libunbound[20345:0] info: canonical basic compare, dname_len: 16 vs 16
[1548887931] libunbound[20345:0] info: canonical basic compare, flags: 0 vs 0
[1548887931] libunbound[20345:0] info: canonical basic compare, type: 256 vs 256
[1548887931] libunbound[20345:0] info: canonical basic compare, rrset_class: 256 vs 256
[1548887931] libunbound[20345:0] info: canonical basic compare, ttl: 0 vs 0
[1548887931] libunbound[20345:0] info: canonical basic compare, count: 1 vs 1
[1548887931] libunbound[20345:0] info: canonical basic compare, rrsig_count: 0 vs 0
[1548887931] libunbound[20345:0] info: canonical basic compare, trust: 1 vs 1
[1548887931] libunbound[20345:0] info: canonical basic compare, security: 0 vs 0
[1548887931] libunbound[20345:0] info: d vs d
[1548887931] libunbound[20345:0] info: n vs n
[1548887931] libunbound[20345:0] info: s vs s
[1548887931] libunbound[20345:0] info: 1 vs 2
[1548887931] libunbound[20345:0] info: rrset 3 not canonical equal
[1548887931] libunbound[20345:0] info: Capsforid fallback: getting different replies, failed

It triggers a capsforid fail on dns1 vs dns2 (which is the first label of the authoritative nameservers of pie3aq.pl). Huh :frowning: . I know I cry wolf about Unbound bugs pretty often, but this seems pretty similar mechanically to the one that actually turned out to be a bug.

Edit: in the pcap for the two responses, the order of the NS are flipped, and I assume the above reflects that. Is it meant to canonically sort them before comparison?


#9

Oh shoot. I totally messed up in my last post, didn’t I?

I thought it was about “i-dns.pl. AAAA IN” but that succeeded and then it failed on “pie3aq.pl. CAA IN”, probably for the reasons you just said. Your attachment earlier ended with:

[1548884805] libunbound[7629:0] info: response for i-dns.pl. AAAA IN
[1548884805] libunbound[7629:0] info: reply from <pl.> 2001:678:4::2#53
[1548884805] libunbound[7629:0] info: Capsforid: reply is equal. go to next fallback
[1548884805] libunbound[7629:0] info: query response was nodata ANSWER
[1548884806] libunbound[7629:0] info: response for pie3aq.pl. CAA IN
[1548884806] libunbound[7629:0] info: reply from <pl.> 2a02:38:14::146#53
[1548884806] libunbound[7629:0] info: Capsforid fallback: getting different replies, failed
Host pie3aq.pl not found: 2(SERVFAIL). (error)

So everyone ignore my entire previous post, instead of just the part I struck out.

:sweat:

I thought Unbound did sort the records before comparing them. Maybe I was mistaken. if it doesn’t, that’s bad…


#10

It does an unsorted compare followed by a sorted compare if that fails (I assume for speed reasons). No sure how it ends up with the same number of RRs but manages to compare them in the wrong order. Might package this one up for unbound-users later on …


#11

Hello!

Thank you all for your help. For now i checked without --staging paramether and it worked ok. Generating live certificate will do for now. In case of your aby further tests let me just inform, that i removed verification TXT records from DNS zone

Thanks!


#12

The maintainer of Unbound has accepted @_az’s bug report and made a patch to fix this issue: https://nlnetlabs.nl/pipermail/unbound-users/2019-February/011349.html. Great work!


closed #13

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.