One unavailable nameserver (of two) broke cert allocation

Hello,

We run a service (exe.dev) where we request certificates for user's domains. Today we were having trouble getting certs. We discovered (after much debugging) that ns2.exe.dev was unavailable. Users had no trouble getting to their sites because ns1.exe.dev was functioning normally.

However the second name server being out of action meant we could not get certs from LE. This lasted for hours, until we figured out what was broken and fixed ns2.

This is, clearly, our fault. We should have fixed our name server within minutes of it breaking, and we are busy adding monitoring now. But it is also odd behavior for LE. I would have expected your servers, on failing to connect to ns2, would try ns1. It is unfortunate our failure cascaded into your system.

If there is any more information I can get you, happy to try.

Best,
David

Pretty sure it'll allow one of many nameservers to fail, but if you only had 2 nameservers and only one was working it wouldn't have enough to form a multi-perspective opinion about validation.

You currently seems to have a standard AWS Route53 setup with multiple nameservers, not sure if you've just moved to that or not.

3 Likes

Oh that's really interesting! Is there some critical information LE gets out of using multiple root NSs?

We use route53 for exe.dev, but our users host on exe.xyz, using the nameservers ns1.exe.dev / ns2.exe.dev. We could deploy more, but I would love to first understand what LE is getting from that, given we are using TLS-ALPN-1 verification.

LE (and most other ACME CAs) use Multi-Perspective validation of DNS resolution and challenge responses:

Multi-Perspective Validation Improves Domain Validation Security - Let's Encrypt

The idea being in the event you can spoof one perspective you (hopefully) can't spoof them all without having genuine control over the domain. Nowadays I think it's a requirement for all CAs.

6 Likes

Think about if someone can take one of nameservers but not all so answers are inconsistent. in that case only safe thing to do is not make any new certificate at all.

4 Likes

My understanding of the "multi-perspective" part is that it's about multiple paths from the Internet to guard against BGP-based attacks (and the web page you posted agrees with that interpretation). I don't see how the number of working nameservers you have affects that at all.

If there's an additional requirement that you need to have a a minimum number of nameservers working to issue a LE certificate, that's news to me.

Each perspective selects an authoritative NS at random. So a primary and some of the secondary perspectives may select working NSes and other perspectives may select the broken ones.

5 Likes

Right, we don't target many different authoritative nameservers on purpose, but since we're making requests from 5ish different perspectives, statistically we're going to hit both NSes from at least one perspective.

6 Likes