[Update 2018-01-18: The most up-to-date summary is at IMPORTANT: What you need to know about TLS-SNI validation issues]
At approximately 5 p.m. Pacific time on January 9, 2018, we received a report from Frans Rosén of Detectify outlining a method of exploiting some shared hosting infrastructures to obtain certificates for domains he did not control, by making use of the ACME TLS-SNI-01 challenge type. We quickly confirmed the issue and mitigated it by entirely disabling TLS-SNI-01 validation in Let’s Encrypt. We’re grateful to Frans for finding this issue and reporting it to us.
We’d like to describe the issue and our plans for possibly re-enabling TLS-SNI-01 support.
Problem Summary
In the ACME protocol’s TLS-SNI-01 challenge, the ACME server (the CA) validates a domain name by generating a random token and communicating it to the ACME client. The ACME client uses that token to create a self-signed certificate with a specific, invalid hostname (for example, 773c7d.13445a.acme.invalid), and configures the web server on the domain name being validated to serve that certificate. The ACME server then looks up the domain name’s IP address, initiates a TLS connection, and sends the specific .acme.invalid hostname in the SNI extension. If the response is a self-signed certificate containing that hostname, the ACME client is considered to be in control of the domain name, and will be allowed to issue certificates for it.
However, Frans noticed that at least two large hosting providers combine two properties that together violate the assumptions behind TLS-SNI:
- Many users are hosted on the same IP address, and
- Users have the ability to upload certificates for arbitrary names without proving domain control.
When both are true of a hosting provider, an attack is possible. Suppose example.com’s DNS is pointed at the same shared hosting IP address as a site controlled by the attacker. The attacker can run an ACME client to get a TLS-SNI-01 challenge, then install their .acme.invalid certificate on the hosting provider. When the ACME server looks up example.com, it will connect to the hosting provider’s IP address and use SNI to request the .acme.invalid hostname. The hosting provider will serve the certificate uploaded by the attacker. The ACME server will then consider the attacker’s ACME client authorized to issue certificates for example.com, and be willing to issue a certificate for example.com even though the attacker doesn’t actually control it.
This issue only affects domain names that use hosting providers with the above combination of properties. It is independent of whether the hosting provider itself acts as an ACME client. It applies equally to TLS-SNI-02.
Our Plans
Shortly after the issue was reported, we disabled TLS-SNI-01 in Let’s Encrypt. However, a large number of people and organizations use the TLS-SNI-01 challenge type to get certificates. It’s important that we restore service if possible, though we will only do so if we’re confident that the TLS-SNI-01 challenge type is sufficiently secure.
At this time, we believe that the issue can be addressed by having certain services providers implement stronger controls for domains hosted on their infrastructure. We have been in touch with the providers we know to be affected, and mitigations will start being deployed for their systems shortly.
Over the next 48 hours we will be building a list of vulnerable providers and their associated IP addresses. Our tentative plan, once the list is completed, is to re-enable the TLS-SNI-01 challenge type with vulnerable providers blocked from using it.
We’re also going to be soliciting feedback on our plans from our community, partners and other PKI stakeholders prior to re-enabling the TLS-SNI-01 challenge. There is a lot to consider here and we’re looking forward to feedback.
We will post more information and details as our plans progress.
Update #1: We have decided to re-enable the TLS-SNI-01 challenge for certain major providers who are known not to have issues while we investigate re-enabling TLS-SNI-01 in general. We’re doing this as a safe way to restore service faster for a large number of sites.