No valid IP addresses found for... DNS A record exists and works

What’s wrong with BIND? Also, if unbound does fall back to TCP when it receives a truncated message via UDP, shouldn’t Let’s Encrypt handle that as oppose to spewing out non-sense of “Detail: No valid IP addresses found for php72.webehostin.com” when that is clearly not the case?

And why do you need more than just the A record? I don’t get it. I’ve bought paid SSL certificates before that work for my domain, and they’re all so eager to take my money…

Let’s Encrypt is great and is helping to save the internet, but why are we using all of these dependencies and going to great lengths to verify things we don’t need to?

I’m really confused. :frowning:

I don't think Unbound is a worse choice than BIND. Whether it's a better choice than BIND is a debate I will dodge. :smile:

My unconfirmed hypothesis is that there is an issue with how the stub resolver is set up. I have no reason to think Unbound is behaving improperly in any way.

The details of how the client communicates with the recursive DNS server, and how the recursive DNS server communicates with authoritative DNS servers, can be very different.

My unconfirmed hypothesis is that it should but doesn't.

If, according to my unconfirmed hypothesis, it's set up problematically and is unable to find any IPs, it's completely true that it didn't find any IPs.

In unboundtest.com's case, it doesn't. For various reasons, the authoritative DNS servers are configured to send the authority and additional sections, and Unbound is configured to pass them along, even though I believe it's not required in this case.

When to minimize DNS response size and when to include unnecessary but potentially useful information, potentially saving the client from making extra queries, is a matter of trade-offs.

In Boulder's case, it's complicated.

This seems like a pretty strong hypothesis based on what I've seen in this thread. For a little historical context: Early in Boulder's history, Boulder would ask Unbound for a copy of DNSSEC-related RRsets, with the thought that we would do DNSSEC validation in Boulder itself. In order to make room for the large responses, Boulder would use TCP and also set the OPT RR (Edns0) bit to indicate a large buffer for the response. However, prior to launch we decided to have Unbound do DNSSEC validation for us. Some time later, we switched from TCP to UDP for internal queries to Unbound (for performance), stopped setting the AD (Authentication Data) bit so we wouldn't get the extra data we don't need, and stopped setting the OPT RR (Edns0) to indicate larger buffer sizes, because we no longer needed them. So far that's been relatively fine. The one case where it might run into trouble is if an authoritative nameserver returns a really large number of records. I haven't looked at this particular domain in detail, but other folks in this thread have said there are a very large number of NS and SOA records, some of which aren't used anymore. @own3mall, I'd recommend cleaning up any unused NS records on your domain, because (a) it's likely to fix this issue, and (b) unused NS records are both a performance problem and potentially a security problem, if someone grabs the domain name of those unused records.

If your goal is to convince us to change Boulder to accept large responses, I'd want to see evidence that such large responses are a very common occurrence in otherwise well-configured domains. So far this is the first bug report we've gotten about this behavior. Thank you for bringing it up, though!

3 Likes

@jsha Fair enough, thanks for that helpful information. I’ll give that a shot for now, BUT

I think Let’s Encrypt should accept large responses, but only if the initial message is truncated because it is too large like in my case. That way, your performance hit only happens occasionally when you run into people like myself who don’t necessarily do things the way everyone else does. After all, the error message output by Let’s Encrypt does not really do the end user any justice. In my case, it appears certbot gives up because it’s too large, not because it can’t resolve that host unlike what the error message states… at the very least the error message output could more accurately indicate what’s going on.

It’s a use case that should be handled in my opinion, and if I were the programmer doing it, I would certainly implement that functionality because I try to handle as many configurations as possible to allow for greater flexibility.

Very good point! I'll look into improving it.

@jsha

Famous last words, but it ought to be simple and zero risk to set the EDNS buffer size back up to whatever the network’s MTU can handle without fragmentation.

And very low risk to set it to 4096, if you don’t block fragmented packets.

It might be worth the effort, at least as a low priority issue?

Handling truncation and TCP fallback would be more complicated but probably unnecessary.

This domain only uses about 1100 bytes. 1400+ or 4000+ bytes would take a lot of NS, A, and AAAA records.

Edit: EDNS0 (4096 bytes, DO enabled) was removed in 2ecb8bf, committed in May. Switching to UDP was done in 4e68fb2 in October.

3 Likes

I cleaned up my DNS records as much as I could, and it’s still not working for me. I’m guessing it’s because it’s still too big.

So, now what should I do? :frowning_face:

You're right. Before it had 31 NS records, and responses were 1116 bytes or so. Now it has 29 NS records and about 1048 byte responses.

They all only have 9 different IP addresses, though. Couldn't you remove 20 of the NS records?

Edit: And some of them don't work.

Which one doesn’t work? There might be some I haven’t yet registered the name server for at the TLD, but they all point to different servers for the most part with different IP addresses. I just have that many servers, and each technically is able to host domains and stand-alone as its own name server.

Each of those NS entries should resolve to an IP via an A record…

I'm not sure. At least 2 of them return REFUSED to queries. I didn't check them all and make a list.

DNSViz or other DNS checking sites can help.

http://dnsviz.net/d/webehostin.com/dnssec/

But there are 29 hostnames, with only 9 different IP addresses among them, only 5 of which are running DNS servers authoritative for the zone (according to DNSViz).

You could get the same results, more efficiently, with 5 hostnames.

1 Like

Normally you want two name servers per address(es), and in some cases, I wanted 2-4. So, that’s why the high count.

Regardless, it’s rather frustrating that I’m having to remove entries when I have my own “cloud” if you will…

https://intodns.com/webehostin.com

Looks fine to me… only one its complaining about is ns5… I’ll add that one back.

I don't understand. The nameservers have 9 unique IPv4 addresses. It would be typical to have 9 NS records with 9 hostnames, one for each IP. (Ignoring the issue that some of them don't work.)

Yeah. :slightly_frowning_face: But if you can improve the DNS configuration and avoid the Let's Encrypt issue, that will resolve the situation faster than getting Boulder changed, and be beneficial itself.

4 of the IPs still have issues.

http://dnsviz.net/d/webehostin.com/dnssec/

1 Like

Those IP addresses do NOT have issues. The tool (DNSViz) has issues connecting to various servers… most likely because the IP the service is using has been banned. I do run strict firewalls.

If you try the IPs it complains about yourself, you’ll see they work just fine. You can ping them assuming your IP isn’t banned (doubtful that it would be) or in a regularly updated blocklist. :slight_smile:

Anyways, I cannot really clean up the records any further than I have… so I guess I’m SOL until this issue is addressed by Let’s Encrypt developers, if it ever will be.

Domains usually require at least two unique entries for name servers, so duplicates are required even if they point to the same IP so that domain registrars are happy.

I guarantee you that I will NOT be the only person to have this issue. This is not a very complicated setup… at all.

The advice @mnordhoff has been giving you is quite accurate and useful. I’m afraid we’re not going to modify Boulder to address this use case. I think you would definitely benefit from cleaning up your NS records.

If I may guess at your mental model: It sounds like you want to provide a variety of nameservers for your customers to choose from when configuring their domain. Is that right? If so, that’s a very reasonable and laudable goal. However, you do not need to list every one of those nameservers as NS records on webehostin.com. Instead, when giving your customers instructions on how to set up their domain, you can present them with the full list. For webehostin.com, you would set ~4 NS records pointing to a subset of your nameservers, ideally ones with different IP addresses.

If I’m wrong about the above guess, can you explain what expected benefit you get out of having 29 NS records rather than 4?

1 Like

It’s more of the fact that each NS record is a separate server altogether (minus the duplicates). Each is completely standalone and separate from the others. Thus, I have 9+ different servers to host different things on.

I believe Let’s Encrypt should fix this issue because other certificate issuers (granted you have to pay) don’t have an issue with this setup. Let’s Encrypt exists to offer free SSL (unless I missed something), but you need to support the way the internet works. I was able to obtain certificates for this domain previously without issue (no changes to DNS), but now for some odd reason, Let’s Encrypt expects us to have a fixed amount of NS records which is a self-imposed Let’s Encrypt requirement and NOT required by the domain name system.

Everyone is forcing us to use SSL (which is fine because “secured” and “encrypted” communications are good), but they’re still not making it easy or free to do so, and now the only organization that does offer them for free is inventing requirements that make no sense.

This thread has nothing to do with customers, it has to do with me trying to obtain a certificate on which my domain happens to be a web hosting service.

Right, but why do you need to list all 9 of these servers in your NS records for webehostin.com? The usual reason to list multiple NS records is so recursive resolvers can try one, then fall back on another if it's down. Do you expect that 8 of your nameservers will go down, and you will rely on the 9th one being up? If not, you are getting no benefit from this arrangement, and some harm to performance.

I actually only brought it up after noticing issues myself; I used DNSViz to corroborate them and easily check every nameserver all at once.

You're likely right, but you do seem to be the first person to report an issue that is about 3 months old.

I don't have actual numbers, but it seems to be very rare for a zone to have so many NS records.

9 NS records, with 9 IPs, probably wouldn't pose an issue.

I still don't understand why the zone has 29.


@own3mall, even if the Let's Encrypt team agree to change this, and even if they move on it quickly, it will still take time to test and deploy the change. Regardless of how this situation happened, you can make changes to your domain's DNS settings immediately.

@mnordhoff What would you say if I had 29 different servers then? It’s entirely possible with the way virtual machines are easily created and deployed using KVM in Linux. My server count keeps increasing due to all of the projects and hobbies I work on, and I like to keep them under one umbrella (different DNS records for different servers).

I know DNS changes can be made immediately, as I already deleted one set of name servers that weren’t in use anymore.

Again, my point is that it doesn’t matter how many NS records you have. You may not understand and agree with the way I did it, but I have my reasons as I stated previously. It should still work with Let’s Encrypt since NS DNS records have nothing to do with SSL certificates.

I’m not asking for the change to be made as quickly as possible. I’m just asking that it will be done at some point. It seemed like @jsha said they have little interest in fixing this issue though.

@jsha, I created NS records so that other domains can use them. I’m pretty sure a NS and A record must be present in order for a domain to use them.

Example:

domain.com uses ns3.mydomain.com and ns4.mydomain.com as nameservers
domain2.com uses test1.mydomain.com and test2.mydomain.com as nameservers

ns3 and ns4 = a server
test1 and test2 = completely different server

mydomain.com needs to have (not official DNS syntax, but you get the idea):

NS ns3
NS ns4
NS test1
NS test2
ns3.mydomain.com. A IP1
ns4.mydomain.com. A IP1
test1.mydomain.com. A IP2
test2.mydomain.com. A IP2

Unless it doesn’t. I’m not a DNS expert, but I still don’t really understand why DNS records are examined more so than for the A record for SSL certificates since SSL certificates should have nothing to do at all with DNS other than determining the IP address for where the requested host lives when attempting to verify the domain…

You don't.

Well, that's a complicated matter.

That's not correct. A (or AAAA) records must exist in your zone, and the TLD will likely choose to require that glue be registered for every hostname and IP address (at your registrar), but the other nameservers are not required to be in the delegation or authoritative NS record sets for your domain.

As an example, jquery.com. uses these two nameservers:

jquery.com.      (unsigned)  84934  NS  george.ns.cloudflare.com.
jquery.com.      (unsigned)  84934  NS  lara.ns.cloudflare.com.

But cloudflare.com. itself uses different ones:

cloudflare.com.  (signed)    86400  NS  ns3.cloudflare.com.
cloudflare.com.  (signed)    86400  NS  ns4.cloudflare.com.
cloudflare.com.  (signed)    86400  NS  ns5.cloudflare.com.
cloudflare.com.  (signed)    86400  NS  ns6.cloudflare.com.
cloudflare.com.  (signed)    86400  NS  ns7.cloudflare.com.

cloudflare.com. itself -- and your domain -- do not need to to have tons of NS records, and your domain doesn't significantly benefit from it.

Ignoring other matters (like CAA and DNSSEC), Let's Encrypt doesn't really examine the NS records much.

The authoritative DNS servers don't have to include an authority section, or additional section, in response to most queries.

But when they choose to, the recursive DNS server implementation, by default, passes the information along to clients, and Let's Encrypt's validation software logs it.

If the authoritative nameservers gave more minimal responses, I think this issue would be avoided.

That's one of the two most critical things a CA does. :stuck_out_tongue:


Incidentally, the delegation for your domain only includes 3 nameserver names, with only 1 IP address. That's a dangerous single point of failure for resolving the zone.

1 Like

I’m still not following. My setup is similar to the way the jquery website works in your example. I use a set of name servers for my domain (ns3, ns4), and other domains use different ones (like phpdev phpdev2).

It sounds like if I had several additional A records this would also fail because it would be too big? Again, it’s a limit that Let’s Encrypt has applied that isn’t needed.

Yes, I use the same IP address, but that’s what a data center with redundant connections and redundant hardware is for.