SERVFAIL while renewing


#1

Having an issue renewing. Here are the details and output.

My domain is: wawl.org

I ran this command: ./certbot-auto certonly --webroot -w /var/www/html/wawl -d wawl.org

It produced this output: Failed authorization procedure. wawl.org (http-01): urn:ietf:params:acme:error:dns :: DNS problem: SERVFAIL looking up A for wawl.org

My web server is (include version): Apache 2.2.15

The operating system my web server runs on is (include version): RHEL 6.9

My hosting provider, if applicable, is: Self-hosted

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): no


#2

I can’t tell what’s wrong. :confounded:

DNSViz and ednscomp are okay.

http://dnsviz.net/d/wawl.org/XBj9lg/dnssec/
http://dnsviz.net/d/chattanoogastate.edu/XBj9_w/dnssec/

https://ednscomp.isc.org/ednscomp/2cdb59a8db

Unboundtest fails.

https://unboundtest.com/m/A/wawl.org/KEHUC2JP
https://unboundtest.com/m/A/wawl.org/ANNFZE35
https://unboundtest.com/m/A/chattanoogastate.edu/PXNFB4KO

It seems to just… give up after asking .edu about the nameservers’ domain.

I don’t want to jump to “Unbound bug”, but seriously, what?

The two nameservers are only hosted in one location, so routing issues are possible, but it looks like Unbound gives up before talking to them.

My own, older, somewhat differently configured, Unbound can resolve them fine.


#3

Yeah, puzzles me too. Hoping someone can give me a little insight.


#4

I think I figured it out.

Here’s a typical query to the TLD for chattanoogastate.edu:

$ dig +dnssec +norecurse @m.edu-servers.net. chattanoogastate.edu

; <<>> DiG 9.13.4-1+ubuntu16.04.1+deb.sury.org+1-Ubuntu <<>> +dnssec +norecurse @m.edu-servers.net. chattanoogastate.edu
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33280
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 3

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;chattanoogastate.edu.          IN      A

;; AUTHORITY SECTION:
chattanoogastate.edu.   172800  IN      NS      ns2.chattanoogastate.edu.
chattanoogastate.edu.   172800  IN      NS      ns1.chattanoogastate.edu.
chattanoogastate.edu.   86400   IN      DS      10114 5 2 A22479C3577ABDDA48962F74EECCE16D3EFE14B5C95FD9463BA5A28F CD67CF3A
chattanoogastate.edu.   86400   IN      DS      10114 5 1 2CA51C740D54B8B3EBE5D58BD012196D5584A895
chattanoogastate.edu.   86400   IN      DS      10618 5 1 F8B1C75138745E23976CAD453E812D89E366E5A1
chattanoogastate.edu.   86400   IN      DS      10618 5 2 54592BC341F637A43C0D14F0704B58913A2B1B702266083AAEEF11B8 96E84400
chattanoogastate.edu.   86400   IN      DS      4483 5 1 5BC068184A5BEC46EC3C786AB8722C8E74559A3B
chattanoogastate.edu.   86400   IN      DS      4483 5 2 D19A8289B0EA70DF1F986138200EE3D5BDF5CA4FAEECD439C72847A6 965AF362
chattanoogastate.edu.   86400   IN      RRSIG   DS 8 2 86400 20181224062829 20181217051829 37217 edu. jLT2oLNFOmlpS1uDHzIZFNyQwIJkl/EEIXjtaDMZJeMztVgERedHnpb7 yRwTnTLrIaAFIAA3lEPJS64Awfgg1ilHnIIOPJ8m3CNRH9W7N/7EIoka dW2iwkPAwrN5eUwnavIlHvSqUYPnZUPO3J+2qEwPh72ijVLOmIP/ddyy TLs=

;; ADDITIONAL SECTION:
ns2.chattanoogastate.edu. 172800 IN     A       192.230.240.252
ns1.chattanoogastate.edu. 172800 IN     A       192.230.240.3

;; Query time: 16 msec
;; SERVER: 2001:501:b1f9::30#53(2001:501:b1f9::30)
;; WHEN: Tue Dec 18 14:15:52 UTC 2018
;; MSG SIZE  rcvd: 532

However, Let’s Encrypt recently changed their EDNS buffer size to only 512 bytes.

Here’s a query similar to that:

$ dig +dnssec +norecurse +bufsize=512 @f.edu-servers.net. chattanoogastate.edu

; <<>> DiG 9.13.4-1+ubuntu16.04.1+deb.sury.org+1-Ubuntu <<>> +dnssec +norecurse +bufsize @f.edu-servers.net. chattanoogastate.edu
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23970
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 9, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;chattanoogastate.edu.          IN      A

;; AUTHORITY SECTION:
chattanoogastate.edu.   172800  IN      NS      ns2.chattanoogastate.edu.
chattanoogastate.edu.   172800  IN      NS      ns1.chattanoogastate.edu.
chattanoogastate.edu.   86400   IN      DS      10114 5 2 A22479C3577ABDDA48962F74EECCE16D3EFE14B5C95FD9463BA5A28F CD67CF3A
chattanoogastate.edu.   86400   IN      DS      10114 5 1 2CA51C740D54B8B3EBE5D58BD012196D5584A895
chattanoogastate.edu.   86400   IN      DS      10618 5 1 F8B1C75138745E23976CAD453E812D89E366E5A1
chattanoogastate.edu.   86400   IN      DS      10618 5 2 54592BC341F637A43C0D14F0704B58913A2B1B702266083AAEEF11B8 96E84400
chattanoogastate.edu.   86400   IN      DS      4483 5 1 5BC068184A5BEC46EC3C786AB8722C8E74559A3B
chattanoogastate.edu.   86400   IN      DS      4483 5 2 D19A8289B0EA70DF1F986138200EE3D5BDF5CA4FAEECD439C72847A6 965AF362
chattanoogastate.edu.   86400   IN      RRSIG   DS 8 2 86400 20181224062829 20181217051829 37217 edu. jLT2oLNFOmlpS1uDHzIZFNyQwIJkl/EEIXjtaDMZJeMztVgERedHnpb7 yRwTnTLrIaAFIAA3lEPJS64Awfgg1ilHnIIOPJ8m3CNRH9W7N/7EIoka dW2iwkPAwrN5eUwnavIlHvSqUYPnZUPO3J+2qEwPh72ijVLOmIP/ddyy TLs=

;; Query time: 94 msec
;; SERVER: 2001:503:d414::30#53(2001:503:d414::30)
;; WHEN: Tue Dec 18 14:16:17 UTC 2018
;; MSG SIZE  rcvd: 500

It’s useless! It fits in 512 bytes, but there are no A records, so it’s impossible to proceed! That’s why Unbound returns SERVFAIL.


#5

So what’s the fix for this? I’m not a network engineer.


#6

Well…

Short term…

Do you know the wawl.org or chattanoogastate.edu domain and DNS admins? They could make changes to how one or both of the domains are set up to avoid this. For example, you could:

  • Delete some of chattanoogastate.edu's DS records.

  • Move wawl.org to different nameservers. For example, create ns1.wawl.org and ns2.wawl.org using the current two IPs. You could also move to a completely different DNS service, but that’s obviously a lot of work.

Let’s Encrypt could change their DNS resolver configuration, but it’s set up this way for security reasons, and they probably won’t.

Long term…

I’ve asked some DNS people what they think about it.

Verisign (the .edu TLD operator) could change their DNS server to handle this differently (like by setting the TC bit), but we’d have to read the specifications and think about it before drawing any conclusions.

Unbound (the DNS resolver software Let’s Encrypt uses) could be modified to handle this situation differently (like by automatically falling back to TCP), but I don’t have an opinion on whether it should be.


#7

Hmmmm…

Well, I do know the DNS admins (we work together :grinning:), so I’ll chat with them a bit.

I really appreciate the insight and troubleshooting assistance.

I’ll let you know how it turns out!

Thanks a million!


#8

I should emphasize that messing with your DS and NS records is a great way to make your domain stop working…

There are simple changes they could make that are normally easy and harmless.

But it can be hard to determine what weird old software lurks in the depths of college IT systems that can break when you make innocuous changes.

And it’s easy to make mistakes.


#9

Oh yeah. For sure!

I don’t have access to and would not touch these things as they are not in my knowledge realm. Not that I don’t have a passing knowledge, but my focus is web development.

Our network team will need to look at this and may even say “no, we’re not making changes” in which case I will have to go a different route for my certificate.

You are right on the mark. College IT systems can be weird!

Thanks for your prompt assistance!


#10

DNS is setup far from ideally: https://dnsspy.io/scan/wawl.org


#11

@dozer55

Hey, I was just wondering, did you or someone else at Chatt State contact Educause or Verisign about this?

I’m making it a New Year’s resolution to follow up on this… or something… and I’m going to try emailing Verisign this week.

Edit: I have yet to find a TLD that behaves differently, so I’m rethinking what to do.

Edit: https://lists.dns-oarc.net/pipermail/dns-operations/2019-January/018259.html

Edit: By the way, you could also “fix” the problem by making the DS and/or NS record sets bigger. If the response was 13 bytes larger, the authoritative DNS servers would either set the truncation bit, or remove the DS and RRSIG records, allowing the NS and A records to fit. This would be gross, less efficient, and might result in resolution issues with a small percentage of clients.