2020.02.29 CAA Rechecking Bug

On 2020-02-29 UTC, Let’s Encrypt found a bug in our CAA code. Our CA software, Boulder, checks for CAA records at the same time it validates a subscriber’s control of a domain name. Most subscribers issue a certificate immediately after domain control validation, but we consider a validation good for 30 days. That means in some cases we need to check CAA records a second time, just before issuance. Specifically, we have to check CAA within 8 hours prior to issuance (per BRs §3.2.2.8), so any domain name that was validated more than 8 hours ago requires rechecking.

The bug: when a certificate request contained N domain names that needed CAA rechecking, Boulder would pick one domain name and check it N times. What this means in practice is that if a subscriber validated a domain name at time X, and the CAA records for that domain at time X allowed Let’s Encrypt issuance, that subscriber would be able to issue a certificate containing that domain name until X+30 days, even if someone later installed CAA records on that domain name that prohibit issuance by Let’s Encrypt.

We confirmed the bug at 2020-02-29 03:08 UTC, and halted issuance at 03:10. We deployed a fix at 05:22 UTC and then re-enabled issuance.

Our preliminary investigation suggests the bug was introduced on 2019-07-25. We will conduct a more detailed investigation and provide a postmortem when it is complete.

27 Likes

Per the FAQ at Revoking certain certificates on March 4 :

In order to complete revocations before the deadline of 2020-03-05 03:00 UTC, we are planning to begin revoking affected certificates at 2020-03-04 20:00 UTC (3:00pm US EST). Please continue to renew and replace affected certificates in the meantime. If there are any changes to this start time, updates will be provided in this thread. Thank you all very much for your patience, understanding, and help as we work through this issue.

11 Likes

After learning about and remediating a bug in our CAA checking code [1] on 2020-02-29 UTC (the evening of Friday February 28, U.S. Eastern time), we announced that we would be revoking approximately 2.6% of our active certificates that were potentially affected by the bug, totalling approximately 3 million certificates [2].

We announced the plan to revoke because even though the vast majority of the certificates in question do not pose a security risk, industry rules require that we revoke certificates not issued in full compliance with specific standards. These rules exist for good reasons. We work hard to comply with them and have an excellent track record for doing so.

Since that announcement we have worked with subscribers around the world to replace affected certificates as quickly as possible. More than 1.7 million affected certificates have been replaced in less than 48 hours. We’d like to thank everyone who helped with the effort. Our focus on automation has allowed us, and our subscribers, to make great progress in a short amount of time. We’ve also learned a lot about how we can do even better in the future.

Unfortunately, we believe it’s likely that more than 1 million certificates will not be replaced before the compliance deadline for revocation is upon us at 2020-03-05 03:00 UTC (9pm U.S. ET tonight). Rather than potentially break so many sites and cause concern for their visitors, we have determined that it is in the best interest of the health of the Internet for us to not revoke those certificates by the deadline.

Let’s Encrypt only offers certificates with 90 day lifetimes, so potentially affected certificates that we may not revoke will leave the ecosystem relatively quickly.

The following certificates have been, or will be, revoked prior to the compliance deadline at 2020-03-05 03:00 UTC (9pm U.S. ET tonight):

  • 1,706,505 certificates that we are confident were replaced during the incident period

  • 445 certificates that we treated as highest priority for revocation because, at the time we found the bug, they had CAA records that forbid issuance by Let’s Encrypt.

We plan to revoke more certificates as we become confident that doing so will not be needlessly disruptive to Web users.

I would like to thank the Let’s Encrypt team for tirelessly working to resolve this situation in the best way possible. It involved incredible effort and I couldn’t be more proud of what we have been able to get done in such a short amount of time.

[1] 2020.02.29 CAA Rechecking Bug

[2] Revoking certain certificates on March 4

16 Likes

We’ve posted another response at https://bugzilla.mozilla.org/show_bug.cgi?id=1619179#c7 with more detail as to the number of currently-revoked certificates and the timeline for expirations of affected certificates.

9 Likes