My Letsencrypt certificate fails to renew randomly

What if the resolver concludes that 3 lost packets in a row means that the authoritative server is dead, though? It might give up and return SERVFAIL instead of sending a 4th query.

I’m not sure what Unbound’s dead server logic is – and I don’t think it’s that simple – but it’s plausible.

I have another idea: What if the resolver is a pre-Flag Day version of Unbound. It might conclude that the authoritative servers don’t support EDNS. But DNSSEC requires EDNS, so the zone would be impossible to resolve.

The case randomization code also has dropped-packet logic – if the authoritative server drops capitalized queries, it might engage a fallback mode and send multiple lowercase queries. Which would, in this case, exacerbate the rate limiting problem.

1 Like

Yeah, we do send a lot of queries:

1 TXT
1 CAA for each label in the name (in this case ~3)

All that multiplied by the number of names being validated (8 in this case) and the number of VAs validating (currently 4).

So we’re looking at ~120+ DNS queries for this one certificate.

I’d recommend upping your rate limit further, or tweaking your renewal script so that it only validates one domain at a time.

1 Like

I wonder, why not cache all replies belongning to 1 certificate? Like, if you validate printer.sebbe.eu and then looks up CAA for sebbe.eu, you could save that CAA in a per-certificate cache (same with CAA for .eu), so when you later validate dns1.sebbe.eu and want to check CAA for sebbe.eu and .eu again, you already have that CAA’s saved.

After certificate is issued or validation is definitely rejected, you throw away that per-certificate cache. That should atleast save like 20-40 queries.

Also, is there some IP list somewhere of all letsencrypt resolvers? So I can set these resolvers as trusted (with higher ratelimit).
Also, If I do that, what ratelimits would be good for you, @jsha ? I dont want to unintentionally enable so my DNS server could be used in an DDoS attack against Lets Encrypt, so I need to be careful with enabling too much of rate in case someone spoofs the IP of Letsencrypt resolvers against my DNS server.

There isn't one -- Let's Encrypt avoids publicizing any of their IP addresses, and they may change at any time.

1 Like

That's a good point, my estimate above is an overestimate. Our Unbound instances do cache things for up to 60 seconds, so we are probably getting a lot of cached CAA answers.

It sounds like there may be some confusion here. The type of DNS servers used in DDoS attacks are recursive servers - that is, servers that an end-user would query, like 1.1.1.1 or 8.8.8.8. The type of server that Let's Encrypt hits during validation is an authoritative server. These are very different roles, but for historic reasons they are somewhat conflated, and often a single piece of software implements both.

It sounds like your authoritative resolver may accidentally be enabled as a recursive resolver. If so, that's what you should fix - and after that you can disable rate limiting. It's also possible that the security scanner was just wrong.

Here's how you can find out if your authoritative resolver is enabled as a recursive resolver: dig example.com @<resolver IP>. If you get a successful result, recursion is enabled.

By the way, I just noticed this:

To be clear, neither I nor any Let's Encrypt staffer will ever ask you for any private keys, whether via PGP or not.

My understanding was that authorative servers can be used in DDoS too, by sending it a query for a domain - often a record with a somewhat large response, and spoofing the source IP. The authorative server would then think its just a query from a recursive resolver, and respond to this query, effectively hiding the source of the attacker and also amplifing the attack.

So this understanding is wrong then? NSD as server software don’t support recursion at all.

1 Like

Yes, you're right.

It's a problem that can't entirely be solved by rate limiting -- it's always possible to spoof a low level of traffic to each of a large number of nameservers.

1 Like

Good point, you're totally right and I misremembered the attack. Thanks for the correction!

That said, your settings for this rate limit don't affect the risk of DDoS against Let's Encrypt any more than they do any other service on the Internet. I would say: if your current rate limit is too low to support issuance, go ahead and increase it. I don't think you will meaningfully increase the risk that your server poses in DDoS abuse.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.