2020-09-07 at 05:44:35 UTC to
2020-09-08 at 17:48:28 UTC, we served OCSP responses older than 3.5 days for
268 certificate serial numbers. From
2020-09-12 at 09:40:31 UTC to
2020-09-13 at 07:22:13 UTC, we served OCSP responses older than 3.5 days for an additional
34 certificate serial numbers. None of the OCSP responses were served beyond their validity period (
nextUpdate). The maximum age an OCSP response ever reached was 5 days. For OCSP responses with a 7-day validity period, the Microsoft Root Program specifies that updated responses be available within 3.5 days and the CA/B Forum Baseline Requirements specify 4 days.
We were notified of the problem by an alert on elevated error-level logs. We found that the errors were caused by a recent change to our RPC system that, in a certain error case, caused a particular column in our certificate status table to have a value of "0" for a specific empty field rather than either the expected value or NULL. We collected serials and last-update timestamp information for affected entries, and enacted a manual plan for continued remediation of these entries.
A Boulder CA software release was deployed to production on
2020-09-10 concluding at
17:59 UTC ( https://github.com/letsencrypt/boulder/releases/tag/release-2020-09-09 ) ensuring no future erroneous values would be added to the database, but remediation queries within regular intervals was still required for existing entries.
2020-09-12, the manual plan for recurring remediation steps was not executed in time causing OCSP responses for the aforementioned additional
34 certificate serial numbers to age beyond the Microsoft Root Program and CA/B Forum Baseline Requirements mandates.
2020-09-13 at 17:22 UTC, the final manual remediation query was executed on the database and we verified that all potentially-affected Certificate Status entries had been remediated.
We have filed the following bug regarding this issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1666047