Probably! Though rather than deploying temporary code in Boulder, we most likely would write a script that can run in a non-trusted zone, pulling the logged precerts from CT logs, fetching the final certificates by serial number from the ACME API, and logging those final certificates to CT. If that’s something you’d like to help out with writing, that would be helpful!
Update on this: We enabled logging of final certificates, but the load seemed to be causing availability issues for some of our logs, so we disabled it again. We’ll consider on Monday what our strategy will be for possibly turning it back on again.
I’m not sure there’s a logical strategy save for migration to logs that can handle the load.
The protocol doesn’t presently provide for declining to accept a new certificate entry which chains to an included root. For obvious reasons.
If you’re not logging the final certificates, it must be presumed that the final certificates will be accumulated and logged in bulk by a bad actor in an attempt to break various logs’ maximum-merge-delay promises.
Are there enough sufficiently performant logs running that accept LE certificates that will not buckle under the load? If not, this surfaces significant questions about the feasibility of Certificate Transparency moving forward.
I think the possibility of a coordinated slam of final certificates not-previously-logged should definitely be part of the threat model for CT logs.
If there’s a significant corpus of unlogged certificates accumulating, one must consider that a bad actor might be collecting these to slam a log in the aggregate. If the log doesn’t have protection against this kind of flood, it is conceivable that the MMD time could become violated. Alternatively, if logs do have rate limiting protections against this sort of attack, it is presumably the case that the attacker could cause at least a recurrent certificate issuance blocking event of some duration by slamming the logs until hitting the rate limit, as the CAs generally are going to block for sufficient SCTs to embed in the final certificate.
Ultimately, if a log at this time is not able to sustainably handle at least a little more than 2x the precertificates being submitted, it must be presumed that the log will ultimately either be rate throttling or eventually fail to meet the required guarantees. Either of these essentially define a given log as unfit for utilization by the higher volume CAs.
In any event, the clear anti-DoS mechanism for bulk overload is to go ahead and get the certificates logged. If that is unsustainable for a given log, that log’s infrastructure must be augmented or the log replaced.
Is your policy on what CT logs you log to (both for precerts and final certs) available anywhere? E.g. “Google Icarus and Cloudflare by default; if Google Icarus is down, switch to Google Skydiver; if Cloudflare is down switch to DigiCert; if those are down too, decline to issue” or whatever?
The configuration isn’t public (yet), as far as I know, but the code is.
It tries to get SCTs from every configured log. They’re divided into groups. The configuration seems to define two groups, “Google” and “not Google”. It keeps the fastest responses and cancels the other requests. If it’s unable to get an SCT from at least one log in each group, it fails. If it gets more SCTs than necessary, it seems to discard the others.
It also tries to log precertificates and (when enabled) final certificates to every log flagged “informational”, but it doesn’t throw an error if they don’t work.
That’s fascinating, thank you. But I’m curious to know which logs are in each group. The Google list will be fairly boring, although I assume LE took guidance from them on which to use. But due to volume they would need to have made an arrangement with the owners of the non-Google logs, so I’m curious to know who has told them they can cope with the volume.
We don’t currently have an always-up-to-date page for our log submission policy, that that’s a good idea in the medium term. As of right now: For Google, we submit to Icarus and Argon. For non-Google, we submit to Nimbus, Sabre, and Mammoth. Our informational logs are Nessie and our own log instance that we are load testing. We additionally submit final certs to Argon.
That’s correct. We submit both precerts and final certs to Argon. To the other logs, we submit only precerts. As of a couple weeks ago we were briefly submitting both types of certs to all logs, but we’ve changed to just the one. We’re aware of concerns about third parties copying final certificates between logs and creating the same load issues we were hoping to alleviate, but this seems to provide a bit of temporary relief. I’ve been meaning to open a thread on ct-policy about the possibility of logs implementing prioritization or separate rate limits between precerts and final certs.