Understanding SMTP DANE implementation options

Hello everybody, I recently implemented DNSSEC on my personal domain (yay!) and so now of course I'm exploring this newfangled DANE business :wink: and trying to understand what would be involved with implementing DANE for my mail server.

From my exploration, there seem to be two main options that people use:

  1. Use a "3" type record, with the leaf certificate public key being in the TLSA record. And when one's leaf key changes (either with every renewal, or perhaps less often if one keeps the same key around for a few certificate renewal cycles), ensuring that the new key is published in DNS for some length of time (at least a couple record TTLs worth) before actually having the mail server use it.

  2. Use a "2" type record, with the CA intermediate public key (or full cert) being in the TLSA record. And then one needs to keep an eye on, like, the API Announcements section of this forum to stay appraised of when those are likely to change and ensure that updates get published, or perhaps ideally automate it in some fashion similar to the "3" type record based on what intermediate signed each certificate.

But, I haven't seen any discussion about what I think is a third way, which seems the "easy" one to me, at least when first starting out with a DANE implementation:

  1. Use a "2" type record, with the CA root certificate being in the TLSA record. This would require the mail server to be configured to send the self-signed root to be sent as part of the chain it sends, as I understand it, but setting that up (by appending it to the chain file I already get from my ACME client in a post-hook type setup for the file that my mail server uses) seems like a one-and-done thing, that only needs a DNS update when the Root changes (which is not very often).

In particular, for Let's Encrypt, my mail server currently gets two certificates, an ECDSA one signed by the EC chain (E1/E2 intermediates), as well the "classic" RSA one signed by the RSA chain (R3/R4 intermediates). In practice, while I suspect that any mail server trying to connect that has DANE support would likely support ECDSA, in order to do this most correctly, as I understand it, with option 1 I'd need to include both certificates each time in the DNS records, and with option 2 I'd need to include all the intermediates, but with my option 3 I'd just need to include the one ISRG Root CA X1 record.

So, all I'd need to do is add appending the root to my existing script (since Postfix prefers the key and chain in one file anyway, I'd just be adding another file to the cat command in my existing post-hook script), and then add one DNS record, and then not need to worry about updating anything until ISRG Root X1 is getting replaced.

So, is there something I'm missing? Does adding the self-signed root to the chain cause confusion for non-DANE-checking SMTP servers, or anything like that? I mean, I could certainly go the route of publishing each key and building more automation into updating my TLSA records and such, and I might end up doing that just for fun at some point anyway, but in terms of just "getting started" it seems like just publishing the root would be the easiest way to go? But I haven't seen any reference to doing things that way in my research, so I'm assuming I'm probably missing something.

Thanks for your thoughts!

13 Likes

I don't have any useful information to contribute. But I wanted to thank you for bringing the discussion here so we can all learn more.

4 Likes

Let's ping @ietf-dane here, who is an expert in this field.

Some terminology first: The different TLSA record types were given names (which I don't think were in RFC 6698, but are present in RFC 7671). The "type 2" is called "DANE-TA(2)", or just DANE-TA in short. I find this naming slightly easier to remember than numbering (particulary regarding their working, TA = trust anchor), so I'm going to use that terminology.

The third way you propose was briefly mentioned in old DANE posts: Please avoid "3 0 1" and "3 0 2" DANE TLSA records with LE certificates - #6 by ietf-dane

Yes, it is technically valid. I do recommend a full read of RFC 7671, in particular sections 5.2 and 5.3 mention some DANE-TA operational notes. RFC 7671: The DNS-Based Authentication of Named Entities (DANE) Protocol: Updates and Operational Guidance. Basically, you have to send the root cert if you're using it in a DANE-TA type manner, unless you're including the entire cert in DANE via a 2-0-0 type record.

On the other hand, the RFC apparently does not recommend relying on 2-0-0 (as its difficult for implementations), but it does recommend including the DANE-TA relevant certs in DNS (in combination with another DANE-TA record (i.e. SPKI hash)) plus sending them over TLS.

In any case, you would likely want to primarily pin ISRG Root X1's SPKI hash in DANE, not the entire cert - there are already two versions of ISRG Root X1, and more may be created in the future. The SPKI hash is far more robust, also smaller.

Compatibility wise, I don't know if you throw off some implementations. Generally when you talk to mailservers, there are all kinds of weird TLS implementations that may not do the right thing and won't like your self-signed certs in chain. Remember that not all mailservers are going to use DANE validation. Those that don't use DANE usually don't enforce any PKIX validation, but in case they do, it always pays off to send a valid chain that can be PKIX validated with well-trusted root certs. The last thing you want is an unnecessary fallback to plain connections.

I personally like the DANE-EE concept most. It eliminates the PKIX validation worries altogether, pinning the leaf's public key instead. It might involve some effort to have good rollover schemes (current + next strategy is recommended), but the PKIX thing brings with it the entire drawbacks the current PKI has. It brings back issues like key usage incompatibilities, path validation issues, extension problems... I love DANE for the fact that it eliminates these things with a DANE-EE setup. Browsers nowadays are very good at path validation, but the email world is so much different, you never know what works.

Some "further reading" links that relate to this topic somewhat, just going to throw them into here for reference:

12 Likes

I recommend against attempting to use the root CA public key hash as a stable fire and forget TLSA record. Even the root CA used by Let's Encrypt will eventually change, and you yourself might stop using Let's Encrypt, ...

The less often you practice something that eventually needs to be done, and the longer you put off planning for it, the more poorly you end up doing it.

The folks who use 1 year certificates tend to have more issues with their TLSA records than those roll them often, but implement solid automation (e.g. https://mailinabox.org/, which for the vast majority of users just works). The one thing that some users that rely on automation miss is the need for monitoring. Unmonitored security is IMNSHO an oxymoron.

So to the question of what is best practice. It is indeed "3 1 1 + 3 1 1", where you configure Let's Encrypt to not change your key during regular certificate renewals (set reuse_key = true in the renewal .conf file), and arrange to inject a new key for certbot to use, only after that key's hash has been been published in DNS for at least a few TTLs in advance.

That way, if you forget to update the TLSA records, the certificate renewal process just keeps using the same key that already works, you keep obtaining new certs (for all those non-DANE clients to verify), and everything just works...

The tricky part is convincing your ACME client to switch to a new key when that key is ready. With certbot, this is not easy, and I expect to release danebot one of these days (I hope before long), that addresses the gap for certbot. With other ACME clients, that could be easier to accomplish.

It would sure be great if the certbot maintainers reached out and made available a command-line option to specify an override next key while renewing with reuse-key = true. With that, danebot wouln't have to jump through hoops wrapping certbot renew. Even better would be the ability to configure a one-time override key, just for the next successful renewal. Please ask the maintainers to step forward.

The other way to use "3 1 1 + 3 1 1" is to let certbot keep replacing the key on you, but not use /etc/letsencrypt/live/.../ directly. Rather copy the chain and key from there to an application-specific area, after ensuring that they are ready to use (e.g. match published TLSA records). Here you have some risk that if the DNS update fails to go through long enough, your current cert will expire and some non-DANE clients could be less happy. Again, monitoring is always important.

If you're willing to stay on top of LE's issuer CA change announcements, and accept the risk of LE issuing certs for your domain to someone else (because MiTM between CA and your domain, and weak ACME proofs, ...) then you can use "2 1 1" records, as some do (and some do poorly). See, for example, my notes. Some 82 MX hosts still list the retired X3 CA as a DANE-TA(2) (RFC7218 terminology), trust anchor.

--
Viktor.

10 Likes

Oh, sure. I keep counting down the years, and know that starting in 10 years or so when everybody really needs to start moving off of ISRG Root X1 it's going to be a bit chaotic around here. For my use case, where it's just my personal domain I think I can track things enough to know that I'll need to update things when the root changes, but I can see how it might not be the thing that one wants to recommend to people in general.

This domain doesn't seem to exist, and I'm not familiar with it. Is there a typo there or something?

If you have automation for updating the DNS records, what's the advantage to keeping the same key for some renewals?

I'm not using Certbot, but a cobbled together custom solution (because I like making things harder on myself, I guess), but I think that if I were to implement 3 1 1 I'd just automate putting the new key in once I get my certificate and wait a day before having the mail server actually use it. It'd be a fun project and probably give me a chance to play around with AWS Step Functions which I've been meaning to do. So I'll probably go that route eventually. It just seemed like it'd be a lot easier (and would help with adoption of SMTP DANE in general) to try to re-use the public web PKI that's already in place as much as possible. Like there should just be an easy way to say "Only connect to my mail server via TLS" and using the regular trust mechanisms that everything else does. MTA-STS (which I'd already implemented, even more DNSSEC) of course does just that, but it seems overly complicated (setting up an HTTPS server just to say that your SMTP should be secure) for what could just be a random flag in DNS saying "security is supported". Honestly, a domain using DNSSEC in and of itself should probably be enough of a hint that the domain is maintained well enough that one shouldn't send it anything unencrypted. :slight_smile:

Well, if they haven't noticed by now then they must not actually care about getting mail. :slight_smile:

Are you, like, crawling the entire DNS tree regularly to scan for MX records and measuring how many have valid DANE setups?

Definitely agreed there. I think that this is something that the ease-of-use of certbot may be hiding from people, where it seems a lot of people have systems that they aren't keeping an eye on.

Thank you very much, both @ietf-dane & @Nummer378, for your helpful thoughts!

11 Likes

Maybe ISRG should start publicizing this issue and encouraging people to audit for it around 6 years from now (rather than 9 or 10 years from now). :slight_smile:

5 Likes

Especially with browsers talking about limiting the lifetime of CAs down further, I expect we'll start talking about the "end of life" plans for X1 fairly soon.

8 Likes

LE certs have a lifetime of 90 days. It's recommended to renew after 60 days, leaving 30 days as "margin for error". Within these 30 days you could easily use e.g. 15 days to update DANE with another 15 days as margin for DANE error. I suspect 15 days is enough to update any DNS RR globally.

Of course Certbot isn't programmed to do so, but you might be able to easily script this. Just have a regularly running script (cronjob) check the cert used by the mail server if it's the same as the cert in /live/. If not, it has been updated and the script will start updating the DANE RR. If successful, it'll enter the 15 day "grace" period, checking for DANE DNS propogation for example. When the 15 day grace period has ended, it'll update the cert used by the mail server and reload the mail daemon. When the old cert has expired 15 days later again, it can remove the old DANE RR.

5 Likes

Sorry for misquoting the mailinabox URL, it is https://mailinabox.email.

6 Likes

Anything that requires people to do something 6–9 years from now won't be planned, or automated. Long-term planning is often ignored, instead what happens is something along the lines of the Surfside building collapse. If you want a reliable process, it has to be planned, automated, practiced often and monitored.

6 Likes

There could be a fourth way: something like _25._tcp.my.mailserver IN CNAME _dane.letsencrypt.org., if LE would be DNSSec signed (they aren't yet) and willing to maintain a TLSA "2 1 1" record with all their active, standby and upcoming intermediates, as they are the obvious authoritative source for this information. This would free users from having to duplicate and maintain this information in their own zones (and as such be more DNS cache friendly, too).

One drawback is that with CNAME, you can only delegate to a single authoritative record, not multiple ones if you use multiple CA's.

4 Likes

That's a nifty idea. I suppose there could be some trusted community member maintaining something like this in the meantime, if Let's Encrypt doesn't want to maintain it themselves, but I don't know how much use it would be or how much trust other people would have in it.

:exploding_head:

Why in the world not? I mean, I don't think intercepting the DNS and trying to send false responses for the ACME API or OCSP or the like would be that advantageous to an attacker, but it seems like the kind of thing that a security-focused organization would have, just to help close any attack opportunities that being unsigned might provide.

5 Likes

Having someone trusted I could just CNAME to would also help me, it turns out, for another reason: AWS Route 53 doesn't support TLSA records at all.

*sigh*

5 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.