Where does LetsEncrypt resolve DNS from?

Noting up front:

  • I understand you won’t guarantee this information since it SHOULD be opaque from a design standpoint - and that we shouldn’t ever rely or need to rely on it.
  • I know that you don’t use intermediate dns resolvers and that you do your own authoritative resolving for the domains

Here’s why I’m asking - I’ve had an issue a few times where (from my perspective) the authoritative servers for my dns provider all return proper results for dns validation, but from your servers perspective, you’re seeing an NXDomain or other failure. This is difficult to diagnose, and having some rough idea of where you are validating from would help with the diagnostics when it happens.

The underlying issue is that my provider has numerous anycast or similar replicas, and my current code checking for whether they are in sync is only hitting a subset of them based on my querying region, so when I look for “is it synced and up to date” before I send out the dns01 challenge request - it’s not actually current or working in all cases.

In this particular case, the symptom was occurring for about 1-1.5 days, but has since stopped, so my current issue is no longer a problem. I would just like to be able to have a bit more detail in future diagnostics if this happens again.

hi @nneul

there are two key components to the DNS

a) your servers are queried not 3rd party servers such as google
b) a DNS server is chosen at random

So if you have 4 DNS servers Let’s Encrypt will randomly select one

Andrei

Hi @nneul,

That sort of geographical load balancing does sound particularly troublesome to work around!

We're typically fairly reserved in what we say about our operations & source network since (as you noted) they're likely to change and we lean towards saying less as good security posture.

I'll ask around to see what level of detail we're comfortable providing but for now I can say that we always resolve from North America, typically from closer to the mid-west, and west coast. There are some plans underway to diversify this further which may exacerbate your problem :-([quote="nneul, post:1, topic:37607"]
The underlying issue is that my provider has numerous anycast or similar replicas, and my current code checking for whether they are in sync is only hitting a subset of them based on my querying region, so when I look for "is it synced and up to date" before I send out the dns01 challenge request - it's not actually current or working in all cases.
[/quote]

Is this use-case and the resulting trouble something you can raise with your provider? It seems like they're in the best position to know when the records you have updated have been delivered to all of the replicas and should be able to provide you that information somehow without relying on externally querying from diverse vantage points and hoping you see all replicas.

Hope this helps, apologies on the vague answers.

That’s helpful though, and understandable. I’ve been trying to work with provider, and symptom in this particular case has stopped presenting itself.

Even knowing that it’s mostly from the west side of US is helpful (as compared to “could be from anywhere in world”).

Part of me wonders if moving in the reverse direction (go to extreme widespread resolution) - and do a quorum of response - might be good from an implementation standpoint, but that’s a whole separate topic. (i.e. do lookups from N distinct places around globe and require that greater than half of them validate the dns01 challenge).

Either way, thank you for the additional information.

Funny you say that! When I mentioned that we are planning to diversify our lookup locations I was eluding to something much like this. We're in the process of working towards deploying what we call multi-va, where we will have several instances of the Boulder VA performing challenges from distinct network perspectives.

There will be an API announcement thread before any changes are made in staging/production but in the future we will likely be including some non-US network perspectives as part of the multi-va work.

BTW, another approach: Can you ask your DNS provider to offer some way to officially find out when your records have been synced to all available nodes? They are the only ones really well positioned to find that out. For instance, Route53 offers this with the “INSYNC” status.

Here’s a quote from them when I asked that originally:

We also do not have something like a sync api call but we would love to have something that returns true when we can ensure that your record has hit every server. Normally we sync our records very rapidly but when you have to push records to 40 different servers across the worlds latency can happen.

Unfortunately, doesn’t currently exist. I will try to work within the options available. (I may need to consider moving back to Route53 though as it has a much better API for dynamic updates than it used to.)

It’s also possible, to CNAME the _acme-challenge-record to another zone.

_acme-challenge.www.example.com CNAME xyz.acme.example.com.

Only zone acme.example.com needs to offer api-updates for Let’s Encrypt, and can be hosted by different DNS-provider.

2 Likes

Unfortunately, in my case, it’s

   <lots-of-different-domains-by-different-people-in-org>.<parentdomain>

I’d love if it the ACME dns validation would allow for “demonstrate control of X or any domain higher up than X”.

i.e. For “A.B.C.D” allow validation via:

   _acme-challenge.a.b.c.d
   a._acme-challenge.b.c.d
   a.b._acme-challenge.c.d
   a.b.c._acme-challenge.d

I don’t see that this would reduce the knowledge/control validation aspect any, but would allow - for example - to completely delegate ACME domain control to a different set of servers since just the _acme-challenge subdomain could be delegated.

I know this doesn’t have any likelyhood of being implemented though, especially now with it being close to an official standard.

thats not how dns works

a.b.c.d is not the same as a.b.c._x.d

Therefore for validation letting a.b.c.d obtain certificate for a.b.c_x.d would negate the whole validation process

Well, it is if you have the constraint that the parent is not a separate registered domain. (My example was not precise enough.)

My point was that if you had provable control of “c.d” - it would be nice if you could use _acme-challenge.c.d to validate for “a.b.c.d”, since by definition if you can control arbitrary dns registrations in c.d - you can definitely control dns for a subset of that space.

Inherently, if you have control over the upper level domain, you can control what’s underneath.

The exposure with that approach is if you have a registrar that somehow allowed you to register “_acme-challenge.com” as a domain, but not sure if that would even be possible.

hi @nneul

There are 2 things that may help your case

A) Wildcard certificates coming in the future Wildcard Certificates Coming January 2018
B) Running a Pre-Flights check

I believe what you want is a universal challenge i.e. a single record you can add to your DNS that will validate all upcoming requests?

If that is the case this is probably not feasible due to the way the ACME spec is written

Although it may seem like a good idea, it would be a disaster as if there was a universal challenge then everyone with an ACME client will be able to issue valid certificates for your domain (as the challenges are automatically passed)

Though this may be prevented by private account keys it still doesn’t make sense to me from a security point of view.

I use CloudFlare and Route53 has also been suggested my feeling is moving to a DNS provider that propagates the record in a reasonable time frame is the only feasible solution

And by reasonable I mean modern web reasonable (30 second -5 minutes)

Andrei

Yeah, I know wildcards were coming, though likely will not make use of them for this particular case.

Definitely wouldn’t want it pre-approved for other than the specific key that issued the challenge request. It’d more be for the purpose of being able to split the path - I’d still want to validate for the full a.b.c.d name - but the idea being that the validation domain hierarchy could be distinct from the actual service domain hierarchy.

In any case, in the circumstance that led to this thread - it was solely a short term diagnostic problem - the vast majority of the time it does work smoothly and my client validation requests only take 30-40 seconds to stabilize before I submit the go-ahead to validate. (It works well enough that I’ve hit up against the issuance limits on a number of occasions.)

In case you’re interested, I’m using this for an internal lab environment for a large group of developers/testers/etc. where they need to be able to test with a variety of clients (such that using an internal CA is mostly impractical), but the devices/servers themselves are not available on the internet. I have them submit the signing requests through an internal management portal that validates ownership/control of the name by internal rules, and then submits the LE signing request on their behalf using DNS validation. Only real downside is the nature of the usage (lots of individual devices/VMs) - it doesn’t lend itself to batching up the certs/using SANs, so I tend to hit up against the issuance limits if we’ve cycled through a bunch of test machine names too quickly. (Names tend to include product version numbers.)

I’m not familiar with option B you mention on ‘pre flights checks’. Got a reference? Or are you just referring to using the staging servers?

hi @nneul

that makes much more sense

What client are you using?

I can share some of the work (alpha) i am doing around other projects (pre-flights)

Andrei

My scripts are currently built around dehydrated.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.