Availability data for the Let's Encrypt API

Hello

We are building a hosted service. We are considering using Let’s Encrypt to create TLS certificates, which would be created for new servers, in response to customer requests.

We need to be careful introducing an external dependency which could stop us from processing customer requests if it is unavailable.

Is there any detailed availability data for the Let’s Encypt API that would allow us to assess the risk here? I have seen the blog post Let’s Encrypt uptime is 99.9% — or 98.8% without defects in 2017, but it’s not easy for me to use the data in there to assess the likely impact on our service of Let’s Encrypt outages.

In particular I’d like to know:

  • During partial outages, what proportion of certificate issuance requests fail?
  • What is the “texture” of the outages? (If there are relatively many short outages then maybe simply retrying is sufficient; if the API tends to be down for tens of minutes or hours at a time then there is necessarily a transitive outage in our service.)

Thank you for any information you’re able to provide.
-Ben

Hi Ben,

You should build your process so that the certification of the domain(s) is asynchronous to the rest of your client site provisioning.

The Let’s Encrypt API will infrequently produce an error 500 result during API calls, so your process needs to know it didn’t complete and re-attempt the request later.

In addition the Let’s Encrypt service may become partially unavailable (such as they just switched off tls-sni-01 challenges, which was news to me because I don’t check these forums often).

So craft your SLA to leave the possibility that you may sometimes be unable to issue a cert for a customer for an indeterminate time and that in some cases the purchase of a paid certificate may be required to expedite certificate availability.

Hope that helps!

Thanks for your reply, Christopher.

We encrypt all traffic to the provisioned systems, so we can’t usefully defer the certificate creation (from our customers’ point of view the request wouldn’t be fulfilled until the certificate is available).

Unless we can reassure ourselves about the Let’s Encrypt API’s availability we are going to have to stick with the wildcard certificates that we use at the moment. (Which, for security reasons, we would rather avoid.)

-Ben

No problem Ben. I don’t really understand enough about your system to suggest more but it sounds like you have it all in hand.

I have yet to meet a 3rd party API with 100% reliability, so I would just suggest trailing LE and see how it goes unless you can tolerate zero failures, in which case I don’t know of any other options. I’m not actually aware of any commercial certificate providers who provider any similar API to LE but I’ve found most other cert provisioning processes to be a bit quite antiquated.

2 Likes

@benbc For what it’s worth, Let’s Encrypt will start offering (production) wildcard certificates February 27. So you might want to replace your existing wildcard with Let’s Encrypt next time it comes up for renewal. :slightly_smiling_face:

There are at least 2 or 3 CAs with APIs, though I don’t think any have deployed ACME yet. I’m afraid to imagine how much they charge. And they don’t have 100% uptime either, of course.

I’m certainly not so wildly unrealistic as to hope for 100% availability. If there was a published historical availability of 99.9%, for example, I’d probably be happy.

My concerns are more to do with lack of information than anything else. For the reported downtime data, it’s not clear how many events stopped the ability to request new certificates – and what proportion of requests would be affected.

And I’m very happy to retry failures and accept a delay.

So I suppose ideally I would find an answer to the question “for a client which retries failures for two minutes, what proportion of certification creation API requests would have failed in the last year?”.

Your concern is not so unreasonable.

Throughout 2017 I spent a not insignificant time helping people with what turned out to be an inability to perform OCSP queries for either some or all Let’s Encrypt certificates issued via a client of ours.

The impact was not pretty: browsers would just not load sites without manual intervention (disabling OCSP stapling in Apache or re-issuing the cert, making network routing changes).

I’m confident that the OCSP operational issues have now been worked out by Let’s Encrypt (great work :slight_smile: ), but your caution is very admirable. Identify the risks (issuance, renewal, OCSP, what else?) and have an easy-to-execute escape hatch if things really go downhill.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.