Let's Encrypt stats, breakdown by signature algorithm

Just curious if Let's Encrypt has data that can show any shifts in adoption between RSA and ECDSA as well as key length. If so, could you add it to the stats page, or post it on your blog?

I found radar.cloudflare.com's Data Explorer but it only allows access to the last 12 months of data.

I'd be rather interested in that sort of data as well. But I suspect that the Let's Encrypt staff have higher-urgency things to do with their time. All the data should in theory be retrievable via the Certificate Transparency logs, though I do understand that it's not a trivial undertaking to do so.

Yikes. Just saw the "It’s been a while since we’ve seen jason_s — their last post was 10 years ago" banner above your post. Welcome back to the community forum!

Where are the Certificate Transparency logs? I'm pretty good at parsing files and data processing. Not sure if I want to parse hundreds of millions of certificates, but a statistical sample shouldn't be hard.

nm, found info: Certificate Transparency (CT) Logs - Let's Encrypt

The primary purpose of the CT Logs is to ensure that CAs are following the rules that they're supposed to, so they're structured mostly around their immutable properties and less around making them easy to query. The Cloudflare statistics that you found are based on them. There are some tools to search through the CT logs that I've collected in this thread:

I'm guessing the easiest way to run a traditional data processing search is to use the crt.sh public Postgres database and run some queries, but it's usually pretty overloaded. Maybe there's a better way out there, too.

You may have better luck using something like censys.io

Although, the number of queries are very limited for free plans. And, I find their API takes some practice to get right (but maybe that is just me).

See: Credits for Free and Starter Users

Are you sure you want the signing algo rather than the pubKey algo? Not that long ago Let's Encrypt signed EC leafs with RSA intermediates.

Oh, right, I'd better be careful with the data I care about. Thanks.

OK, so I worked on a Python script last night to query crt.sh for certificates based on a randomly generated ID, as a statistical sample, and sticking them into a local sqlite database so I can analyze the sample. It's been chugging away for almost 24 hours and I have about 6400 certificates so far. (plus another 400 or so of another category which I will mention below.)

Some interesting tidbits:

Rate and date of certificates in the log

Here is a plot of the Not Valid Before date (linear y-axis labeled by year) vs. crt.sh ID (log x-axis). I knew that the number of certificates keeps accelerating, but I wanted to see more precisely how this changed, so that I could sample the certificates more evenly over time.

You can see a couple of things here. (And apologies if most of this stuff is well-known already, it's interesting to me.)

The logs seem to have two types of certificates:

  • contemporary certificates, placed in the log about the same time as their issuance
  • older certificates which were issued much earlier, and for some reason they have been collected after-the-fact into the crt.sh logs.

The way to distinguish this on the graph is that there is a "wavefront" or "vanguard" above which there are no certificates, except for a few anomalous ones logged in mid-2023 which appear to have a Not Valid Before date that is 10-12 months later than they were added to the log. To the right of the graph (ID ≥ 106 or so) this "vanguard" is a more-or-less solid curve. The left side of the graph, the "vanguard" is basically a horizontal line in early 2013. (February?)

Certificate Transparency efforts appear to be in production starting in early 2013, and the first 1-2 million log entries on crt.sh appear to be collecting copies of mainly already existing certificates.

The orange curve was my attempt to approximate the vanguard curve with a function ID = f(u) so that I could generate random IDs that would be relatively equally-distributed in Not Valid Before time. The way this works is fairly simple:

  • generate a uniformly-distributed random number u between 0 and 1 (or some sub-interval), where u=0 represents Jan 1 2008, and u=1 represents mid-2025 (approximately the present), so essentially u = (t - Jan 1 2008) / 17.5 years
  • compute ID = f(u), rounded to the nearest integer.
  • query the crt.sh database for this ID, and cache the resulting certificate, indexed by ID

I'm not sure what to do about the non-contemporary certificates collected after-the-fact, so I filtered them out, basically starting at 2013 and ID = 106, and running through my statistical sample of certificates, tracing the "vanguard" forward in time, with a slew-rate limit of +3 days per sample (this rejects sudden glitches where suddenly there's an old cert from 2009, or one that jumps forward), keeping only the certificates that are within ± 90 days of the vanguard.

That gives us the data plotted below; 1374 certs out of my sample of 6403 were rejected either because the ID is less than 106 or the Not Valid Before date is more than 90 days from the vanguard curve.

Here I've plotted the leaf certificates (blue dots) as well as the precertificates (red x's), which appear to start in the spring of 2018, and make up about 40% of my sample set. (All the root certificates I downloaded are in the low-number batch; I only ran across one intermediate certificate for some reason.)

For the remaining 3000 or so data points, which are leaf certificates, I can do a histogram of public key algorithm by calendar quarter of Not Valid Before date:

(upper subplot = fraction of certificates with each public key type and size; lower subplot = number of samples obtained for each quarter)

This data set is a bit sparse (need more samples!) but you can see that:

  • the fraction of leaf certificates with RSA 2048-bit keys was nearly 100% in 2013, but it has decreased somewhat over time
  • there was one certificate in my sample with a 1024-bit RSA key in 2013 (a few more in the batch of IDs below one million)
  • ECDSA made an early entrance in late 2014 and 2015, but then faded away for a couple of years at a low level, and then started a resurgence around 2023, now making up almost half of the certificates in early 2026
  • 4096-bit RSA keys have been fluctuating over time. There are a few 3072-bit RSA keys. (And I found one strange certificate with a 2432-bit RSA key.)

If I aggregate the data by year rather than by quarter, it's a little less noisy (but less time resolution) since there are more samples per histogram bin:

I'd be interested in querying the crt.sh database for statistics, rather than grabbing samples one by one, but I don't know how I would do the kind of signal processing that I have done on my statistical sample. SQL queries wouldn't be able to reject these outliers... should I have included all the leaf certificates, even the ones collected in the logs far after the fact?

Also I ran into a problem where about 400 certificates from 2013 and earlier were technically malformed and the Python cryptography library cannot read certain fields. 99% of these were issued by GoDaddy or its spinoff Starfield Technologies. (the other few were issued from companies in Spain.) I downloaded the certs but excluded them from my dataset.

Updated statistics after getting at least 500 sample leaf certs per quarter that are within 120 days of the vanguard (see my earlier comment for definition):

I've added a breakdown of certificates by issuer, as well as the estimated number of certificates by quarter.

My biggest doubts are the GoDaddy certificates and how many they've posted to Certificate Transparency, but whatever.

There were a few certificates with oddball key lengths (2432,3096,8096) that I eliminated. The vast majority have historically been RSA 2048-bit, but that's come down in recent years with increases in RSA-4096 or ECDSA-256.

Since ISRG (Let's Encrypt) and Comodo have dominated in recent years, here's their breakdown:

Looks like Comodo made a big push to ECDSA in 2014, then backed off, then pushed again to ECDSA more recently. ISRG has had a smaller push.

How long does it take new Let's Encrypt certificates to show up in the CT logs? Looks like there might be a lag of several quarters before the log catches up with new certificates.

Let's Encrypt supports SCTs, so all certificates are submitted to logs prior to issuance. Traditional CT logs have a maximum merge delay, after which the certificate must be present in the log (merged into the tree). This delay is 24 hours for almost all older logs. The newer static CT logs do not have any delay at all.

If you're quering a log aggregator (such as crt.sh) instead of the logs directly, then this aggregator may have a backlog of its own. crt.sh for instance is often struggling. The current backlog for each log monitored by crt.sh is public: crt.sh | monitored-logs. Other aggregators will have different backlogs, or no backlogs at all if they're fast.

How can I interpret the backlog at crt.sh?

Let's Encrypt is the dominant certificate authority right now, but if I look at "Tree Size" in the crt.sh monitored logs, the Let's Encrypt CT logs are fairly low down in the list, with no backlog... which doesn't jive with my recent queries. Do the majority of Let's Encrypt certificates go through other log servers like Cloudflare or Google or Sectigo?

Let's Encrypt is required by Chrome CT Policy (Chrome Certificate Transparency Policy | CertificateTransparency) to log to two distinct logs. There is no requirement for Let's Encrypt to submit certificates to their own logs. The actual logs to which a certificate is submitted can vary from certificate to certificate. The logic behind this is complicated and changes over time.

You can look at any given certificate with SCT to see logs to which it was initially submitted to (it may have been submitted to more though). For example, here's one of my recent LE certs:

As seen here, it was submitted (at least) to DigiCert Wyvern and IPng Network Gouda. No mention of a LE-run log.

Okay, got it. In any event, it looks like there's a total backlog (when I just queries now) of 1146819743 (1.15 billion) entries, which corresponds to... how many leaf certificates? If each leaf certificate requires two SCTs (three entries in the crt.sh database) and each certificate shows up as two entries in the logs, then if I divide by 6 I get about 200 million leaf certificates in the backlogs, which seems to be around 20-25% of the run rate of leaf certificates based on my statistical samples.

crt.sh only stores a certificate once, if it sees the same certificate in multiple logs it will only store it once in its database. Many certificates are included in half a dozen logs (anyone can submit to any log, these are public, e.g. Google runs a tool that just re-submits all certs they see to their logs).

crt.sh contains two versions of every certificate though, because every SCT-based issuance produces two certificates: A pre-certificate which is the initial certificate without SCTs, since the CA at first does not have them, but needs to submit a cert to get them. The pre-cert is needed to avoid a catch-22. Then once the SCTs have been returned to the CA, the final certificate is issued, which is usually also submitted to CT logs (this is not required, but common practice). Since this is technically a distinct certificate, crt.sh shows both by default (unless you select the de-duplication option).

OK, I'm just trying to map the number of backlog entries to the number of certificates. I know crt.sh only stores a certificate once, but if it is in two logs, then the number of distinct certificates in the backlog of 1.15 billion log entries should be about half of the 1.15 billion = 575 million. Then you have to account for pre-certificates (and there's one of them, not two?) which also would show up in two logs, so 575 million distinct certificates should be around 287 million leaf certificates (plus a very small number of others if there are new intermediate certificates).

Does that math sound right?

The problem is that we don't know if there are biases, e.g. it's possible that 100% of the backlogs contains "not-yet-seen" certificates, or maybe 0% of the backlog contains "not-yet-seen" certificates (if all of them were submitted to another log without backlog). You can try to estimate a lower and upper bound for the backlog, but no hard data without checking for any bias first. Also, the backlog is highly variable and may have been higher in the past.

If you want per-day averages, I would go by the cert issuance date and ignore certs that have been issued more recently than the highest backlog. Then the data should be accurate, since no not-ingested cert for that date can exist.

My methodology in this graph was to generate random integers for crt.sh ID which are prewarped by a function that approximates the rate of increase for issued certificates, download those certificates, and adaptively add samples in different ranges of ID until I had 500 samples of leaf certificates per quarter.

The estimated number of leaf certificates per quarter is based on the numerical range of samples, as well as the fraction of certificates that are leaf certificates, ignoring the small fraction of certificates that showed up in the log more than around 90 days "late" compared to other certificates with similar crt.sh ID.

I don't have a rigorous error estimate (I'm not that good at statistics); I would guess that I might be off by 10% or so, unless I really screwed up something obvious. But I'm wondering about the apparent downturn in the last two quarters.

It seems more likely that crt.sh just hasn't caught up yet.

Perhaps that is just whatever is causing this: Let's Encrypt Stats - Let's Encrypt

See the Certs Per Day chart especially

I asked about that some time back but never got a definitive answer. I don't have a particularly good guess myself. The daily issuance drop with a consistently increasing number of active registered domains is probably a clue but one I haven't resolved :slight_smile:

IDs are assigned by crt.sh in ingestion order, and so can be heavily biased depending on which log was ingested at what time and which CAs logged there. I don't know what the error is, but relying on them for anything date-based seems unreliable.

There are APIs available (e.g. censys) where you don't have to do this guesswork. For crt.sh, you can probably use the postgres API directly and filter by exact dates, but you need to be smart about how you structure your query to be re-entrant as the postgres API doesn't support long-running queries.

If all you want are stats about Let's Encrypt certs, and don't care about other CAs, we've recently started trying to always get one SCT from a static log. So if you only query static logs (which should be very very fast), you'll catch >99% of all LE precerts.