Maximum (and minimum) certificate lifetimes?

kenny · November 13, 2015, 10:33pm

As mentioned in another thread, I doubt that this short time span will increase security. If you get attacked in a manner that you lose your key pair, chances are that you will also lose your renewed key pair 60 days later. If someone is aware of a problem/attack, then they should also be able to revoke the corresponding certificate.

IMHO it's even worse - the automation that takes places has to provide an externally available entry point (challenge-response). I'm certain that the day will come, when hosts get hacked because they employed a vulnerable Let"s Encrypt automation script.

pfg · November 14, 2015, 4:42am

Can you elaborate on those bad habits?

The argument for shorter certificate lifespans (or, well, one of them) is basically that you don't have to rely on the flaky revocation process actually working when your key gets compromised. Asking for longer certificate lifetimes and arguing against reusing keys at the same time doesn't make sense from a security POV (or any other POV I can think of).

I don't see how this is any different from e.g. your webserver having access to your private keys. The client will eventually be delivered through the same channels you probably get your webserver from (i.e. your distribution's repository, using signed packages), which probably also has an update mechanism where you won't manually review every single update down to the source code. It's not as if the ACME spec requires a client to provide remote code execution for the server, so that they could simply update something on their end and suddenly get all your keys. Additionally, there are plenty of verification options within the existing ACME spec (with more to come), some of them make it fairly easy to run your certificate provisioning from a completely separated system.

There's no technical reason why a service that uses privilege dropping somehow requires a full restart to reload certificates. Graceful reloading has been a thing in many webservers for a long time. If your threat model includes malicious updates to your ACME client, simply switch to one that doesn't require root (e.g. letsencrypt-nosudo) and automate the certificate reloading in some other way. It's entirely possible without downtime. Additionally, if you're worried about going down for about a minute every 60 days, you should have multiple loadbalancers in place either way and operate with rolling upgrades.

It is not necessary for hardware loadbalancers to include an ACME client directly to be able to automate renewal with letsencrypt. All you need is an API allowing you to provision new certificates, I would hope those are available on the better products out there. If that's not possible, I guess that's one of the rare cases where sticking with your current CA offering longer lifetimes is a good idea. I don't think supporting every corner case should be letsencrypt's mission.

My answer would be the same as for the previous point, but I would add that something like letsencrypt actually would vastly improve the situation for IoT. I think there are two possible outcomes there: Either TLS will never become a thing for the majority of IoT endpoints and their security remains terrible, or devices will start adopting some kind of automation, and CAs supporting ACME are probably the best option out there right now. Or do you really think someone's going to manually renew the certificate on their IoT toaster (forking over money!) when it expires 3 or 5 years after they bought it? That doesn't sound realistic to me at all.

NOYB · November 14, 2015, 5:15am

Ask this of Let's Encrypt. They are the ones trying to drive behavior through their idiotic policy.

kenny · November 14, 2015, 9:34am

I already did.

I didn't know, that revocation processes tend to be flaky. CRLs and OCSP are well established standards.

That may be, because you only think like a technician and completely ignore that there are also organizational processes to follow in typical business environments - like, say, security measures to get ISO 27001 certified.

You didn't get the point. What if the update mechanism fails? What if after the automatic update my existing certificates are gone for good but I haven't received new ones? Who will be liable for the outage produced by a tool that's provided by Let's Encrypt? Lost revenue? Providing a CA (in the case of Let's Encrypt I also mean RA, because it does both jobs through ACME) is primarily an organizational task - identify legal and natural entities and certify the successful identification. Providing a certificate is only the last bit of the whole process (whereby issuing cards is another possible outcome).

When the service dropped its privileges to a non-root user, it's not able to read certificates owned by root chmoded to 600. That's (one of the) reason(s) for dropping privileges. I know that there exist services that don't drop privileges, but spawn unprivileged child processes. This and only this behaviour allows for graceful restart. But this is not how all software components work (just a tiny fraction of them do it this way, actually).

My threat model should be clear from my post: unprivileged access to private keys. You haven't addressed this even once. Switching to a non-root client does not help a bit, because it would be unprivileged and thus not allowed to touch private keys. Also, having automatic processes run as root is not an option in secure environments as well. What if they wreak havoc? As I said, the automatic process that Let's Encrypt tries to induce is bad habit from a security standpoint. To sum these bad habits up again for you: processes running as root (these should be reduced to a minimum in secure environments - there are other ways to retrieve certificates without a root process, therefore Let's Encrypt imposes an additional, unnecessary security threat) and unprivileged processes that are allowed to access private keys (but as the alternative for automated root processes is unprivileged private key access, Let's Encrypt also imposes an additional, unnecessary security threat in this case).

So you would allow automatic processes to modify the configuration of your core network components? Regardless of the problem that in secure environments you have a separate management LAN that connects to the management ports of these devices and that the management LAN should be strictly separated from the internet (for obvious reasons).

Well, most IoT users will just ignore Let's Encrypt due to its impracticality and either disable TLS or use StartSSL that provides 1y certificates for free.

pfg · November 14, 2015, 10:37am

Being an established standard alone doesn't mean anything. OCSP queries (or any online verification check) can be discarded by a MITM, and most browser will silently ignore failures in most situations. Chrome has implemented CRLSets because CRLs weren't working fast and reliable enough.

I'm specifically talking about the fact that you can't simultaneously argue "longer certificate lifetime = better" and "I can't reuse the same key every 2 months". Can you quote an example of some (publicly available) policy that forbids rotating certificates every 2 months, while rotating the keys only once a year or something similar?

Have monitoring in place. Use blue/green deployments. If you are running a service with that kind of uptime requirement, you should already be doing that anyway, or how do you update other parts of your infrastructure?

Sorry, but you lost me there.

Is there a inherent security risk to having a master process that's not accessible over the network? I'm all for reducing your attack surface, and using as little privileges as possible, but if you manage to find a vulnerability that affects the master process, you would probably also find one where you can exploit the startup process before privileges are dropped, no?

Additionally, it seems to me that if you have a service that requires downtime for e.g. configuration changes, you need to have redundancy in place anyway if downtime is not acceptable to you (well, obviously not just for that reason.)

Nothing in ACME is forcing you to give the client access to your private keys. If you want to put a system in place where the CSR automatically gets generated by one part (running off some kind of HSM, for example), and your ACME client only has access to the CSR, that's perfectly fine. I would personally argue that using a client not requiring root and manually testing updates to this client before deployment is sufficient, but it's your threat model, so that option is available to you.

Once again, if downtime is a concern, you will need redundancy and blue/green deployments anyway, since you can't guarantee there will be no human error during manual renewal or other configuration changes. Yes, with reasonable monitoring and blue/green deployment, I would have no issue with automating something like this. I can't think of an automated solution for an environment where the management LAN has to be air-gapped from the internet, I'll give you that.

Let's be realistic, I think it's more likely that they will generally just ignore any certificate errors they're seeing. How many people do you know that replace their invalid home router certificates with one from a valid CA, for example? My argument here is that Let's Encrypt (or rather ACME) would at least be an option that could be implemented by manufacturers to take care of provisioning certificates without needing any kind of human interaction.

kelunik · November 14, 2015, 10:57am

Please note that StartSSL isn't completely free. They charge for revoking certificates, which is pretty bad IMO. It's clearly the thing that costs them also money and the right ones pay for it (they're using CRLs), but having a free certificate and having to pay then to revoke it reduces the likelihood that the certificate gets revoked at all, which reduces security.

gdude2002 · November 14, 2015, 1:12pm

I can vouch for WoSign, but they have changed a lot recently - their old cert system allowed you up to 100 domains per cert - while I understand that can be bad practise in most cases, it does take them at least a day to actually issue your certs, so it can be a pain to get a lot of certs at once.

My1 · November 14, 2015, 1:15pm

and now they only do single domains, at best with www certs for free…

well except for the revocation cost, it’s now similar to startssl

kenny · November 14, 2015, 1:25pm

I know, I don't like their practices either - especially the way they handled Heartbleed. However, they are quite common for spare time project servers, as they're well-known and (for most of what people are interested in) free.

kenny · November 14, 2015, 1:51pm

Again I'd request you to actually read my initial post. I clearly stated that:

As you can see I explicitly mentioned that this is the way we do it - as an ISO 27001 certified company.

So your advice is to replace a proactive approach (know when to change the certs and then do it manually once a year) with a reactive approach (monitor that the cert renewal every two/three months works and test the automatic approach). Which still does not take other requirements into consideration.

That fully depends on the implementation of the actual software. If there's a bi-directional communication channel between the privileged process and the unprivileged child process then yes, this poses a security risk. When communication is not possible with the software before it drops its privileges it is much more unlikely to attack the startup process (e.g. opening a privileged port but not yet accepting connections, but only after the privileges have been dropped).

We have redundancy in place. But what does that help if an automated process (that will be similar on all machines) kills them one after another? That's exactly the problem mit automated processes that are able to kill your service. You won't have one automated and one manual node.

With the major difference that a manual process consists of several steps:

take node out of cluster
stop service
update service (software update, config change, cert update)
start service
check availability of service
add node to cluster

"You just have to automate the certificate retrieval." is a very naïve point of view on this topic. Solving the additional problems by just adding monitoring (which we have, even with additional STONITH procedures in place) is - in my opinion - not possible.

To end this discussion from my side: In my opinion, 90 day validity is too short for serious business usage. Automatic renewal may be a nice feature for simple setups, but is impractical, insecure or even impossible in other scenarios.

And finally: I'd have loved to use certificates of a CA that's endorsed by Mozilla and even the EFF. But currently this is not a viable option.

pfg · November 14, 2015, 2:08pm

If I had to venture a guess, I would say this policy is in place to avoid reusing the same key over the lifetime of multiple traditional-lifetime-length certificates, which absolutely makes sense. It doesn't make as much sense when you're talking about reusing the same key for, say, a year in six or so short-lifetime certs. In which case it would be better to make that explicit, and put a policy like "The same key may not be used for more than n months" in place. If that's not possible in your case, I'd say that's more of an organizational issue, and not something Let's Encrypt should put too much focus on.

Just to clarify what I meant with blue/green deployments in this context, it's basically the procedure you described here, but automated. Take the node out of the cluster, update certs, run a health check, rejoin the cluster, continue to the next node (possibly with a delay to avoid running into some error that's not immediately detectable - however unlikely that is with TLS certs). In case of failure either do a rollback (if feasible) or leave the node out of the cluster, and then alert someone for human intervention.

pfg · November 14, 2015, 2:46pm

In general, I would argue that automation is good for security. Manual processes are error-prone, will get executed far less often and the ad-hoc nature usually means they're not or just poorly tested.

This is in some way a good point, but there are a few flaws.

First of all, certificates from traditional CAs might have way longer lifetimes than a year - so you might only get this reminder once every 5 years instead of every 11 months. 5 years of not reviewing your setup is a long time.

Additionally, not every CA sends out reminders with such information, and quite often you don't even buy your certificates directly from a CA, but through some reseller like a hoster, who's even less likely to include that.

Lastly, I would say that the Let's Encrypt client is in an unique situation where it can actually help keep your server configuration up-to-date when best practices change, if you chose to let it manage your TLS configuration. I'm not sure if this is something that's currently being done or planned, but it's certainly a possibility.

kelunik · November 14, 2015, 2:56pm

No, the maximum allowed time is 3 years currently.

pfg · November 14, 2015, 3:00pm

Oh, didn’t know that was changed, thanks! I remembered getting some 5 year ones about a year ago, that’s where I got the number from.

kelunik · November 14, 2015, 3:03pm

Except as provided for below, Certificates issued after 1 April 2015 MUST have a Validity Period no greater than 39 months.

eva2000 · November 14, 2015, 3:20pm

not agreeing or disagreeing, just linking to an article i read the other day The sorry state of certificate revocation | CSO Online

The current standard revocation processes (involving CRLs and the Online Certificate Status Protocol, or OCSP) have way too much latency built in. On top of that, most clients cache any previous revocation checks for the lifetime of the CRL, which means that, practically, when an organization revokes a cert, it can be up to a day or longer before the relevant software notices (assuming the software even looks, which it often doesn't).

PKI consumers want real-time revocation. They want a PKI admin to revoke a cert -- and to know immediately that the cert is bad and can't be relied upon. This doesn't happen much in the real world, at least not in a timely manner. Unfortunately, many private PKI admins don't know that. They think once they revoke a cert, that cert can't be used any longer. But who can blame them? They're doing what they've long been told works. It doesn't.

My1 · November 14, 2015, 3:31pm

well even with 90day certs they wont get realtime revocation checks

gdude2002 · November 14, 2015, 10:05pm

It was probably flagged as unconstructive - don't take it personally, it's important for staff to keep a thread this large clean and constructive.

gdude2002 · November 15, 2015, 1:32am

Many forums don't allow this as it would result in a loss of post data, which means that this topic would be missing the context from your posts.

Having run a Discourse forum before, I can tell you that it doesn't allow you to delete users once they have over 10 posts, too.

Anyway, this is getting off-topic. Do we think they're ever going to read this thread fully?

kenny · November 15, 2015, 9:59pm

Which would be against European data protection laws (the so-called "right to be forgotten").

Btw.: This is what's written in the privacy policy that's linked to on https://letsencrypt.org/. So go ahead.

https://www.linuxfoundation.org/privacy:

Should a user find inaccuracies in such user’s information, or desire to close an account or view the personally identifiable information Linux Foundation may have regarding the user, the user may contact Linux Foundation through the communication methods described below, or when technically feasible, directly on a Site. Linux Foundation will make commercially reasonable efforts to respond to requests for access within thirty (30) days of receiving requests. Linux Foundation may decline to process users’ access or update requests to their personally identifiable information if the requests require disproportionate technical effort, jeopardize the privacy of other users, or are impractical (for instance, requests concerning information residing on backup tapes).