July 17, 2017: Partial OCSP and Complete Issuance Outage Postmortem

josh · July 19, 2017, 4:13pm

We’ve completed a full postmortem for our outage on July 17 and we’d like to provide some details to our community.

From 2017-07-17 20:43 UTC to 2017-07-17 21:54 UTC Let’s Encrypt had an OCSP outage for non-cached responses. Concurrently, from 2017-07-17 20:43 UTC to 2017-07-17 23:24 UTC Let’s Encrypt had an ACME API services outage.

The onset was somewhat graduated as an edge firewall became overloaded and progressively failed local and remote traffic. Let’s Encrypt staff were alerted to the problem by internal monitoring of the staging environment at 20:48 and began to investigate. Due to a database repair in progress at our secondary datacenter, we were unable to simply fail over to our secondary datacenter. In the end, it was necessary to engage staff in the data center to reboot one of the redundant firewalls and enable the restoration of services.

The issue with load on the firewall was a known problem and remediation was already underway, including a plan to replace the current hardware. The High Availability (HA) failure pattern of the firewall in this situation did not flow as expected from testing and documentation, which led to the need for physical intervention and extended the outage time. A new HA arrangement for the firewalls is part of our remediation plan.

Let’s Encrypt will be taking steps to reduce the load on the current firewalls until the new hardware and configuration can be put in place. Additionally, the response plan for this particular type of failure has been improved with lessons learned from the outage. We will be improving our internal documentation to reduce time to resolution in the future.

We apologize to our community for the downtime, and as always, will strive to do better in the future.

Topic		Replies	Views
2025.07.21 Complete API outage Incidents	0	446	September 2, 2025
Outage: December 15, 2015 Incidents	0	2737	January 9, 2016
May 19, 2017: OCSP and Issuance Outage Postmortem Incidents	0	25081	May 25, 2017
Postmortem request for 2017-07-17 outage Help	3	1885	July 19, 2017
Expired OCSP responses, December 12 Incidents	5	6191	December 19, 2016

July 17, 2017: Partial OCSP and Complete Issuance Outage Postmortem

Related topics