Monitoring of Let's Encrypt CA

DanCvrcek · June 25, 2020, 10:25pm

Disclosure - we built this.

We are developing some ACME tools and as we are curious about LetsEncrypt performance, we built online monitoring as a validation tool.

If you find it useful - good. If you think it’s lacking something - even better, especially if you let us know.

We issue and measure 8 certs/minute across 4 locations (prod) and 20 certs / minute across 5 locations (staging). Timing measurements are for each API call separately. The library has an OCSP client but no data collection yet for that.

You can also get weekly email reports.

https://keychest.net/letsencrypt

Update 1: June 27, 16:15UTC - upgraded monitoring servers, improved caching, optimized access to speed up data collection, corrected visualization of downtimes (we initially assumed that downtime = no API response … how silly of us).

Update 2: We have detected the first downtime on 25th - 11 minutes before the official detection.

_az · June 25, 2020, 10:34pm

That’s extremely cool, nice job!

Osiris · June 26, 2020, 6:38am

Is hammering the ACME server like that allowed according to the user agreement?

DanCvrcek · June 26, 2020, 8:08am

As far as I can see, User agreement doesn’t cover technical details. These are enforced by rate limits with which we comply.

Interestingly, we were looking at testing the utilisation of LE API but it sounded too harsh even if it were just a few bursts a day. And you can only do it against a couple of API EPs. It may also be true that this information is better to keep “less public”.

We assumed that variations in the load would be seen in the overall latency. The first weekly report suggests it may be possible but only the time will show. The latency went up by 1000ms on all monitoring stations on Thursday morning. But it may simply coincide with maintenance, although the increased latency lasted much longer.

DanCvrcek · June 26, 2020, 8:17am

It looks like downtime table needs improvements - DB shows downtime:
| 2020-06-25 07:17:07 | |
| 2020-06-25 07:17:01 | |
| 2020-06-25 06:53:10 | |
| 2020-06-25 06:52:40 | |
| 2020-06-25 06:52:11 | |

but rendered table is empty. The official LE record is:
Identified 7:04am, resolved 7:18am

Note: we thought, yesterday, that we had a bug in our client code as LE started returning “malformed” / “badPublicKey”.

jillian · July 1, 2020, 4:49pm

Cool tool! Just to be clear, the timestamp on letsencrypt.status.io doesn't reflect the time our alerts go off, it's the time that we posted the update. You can see in some of our public post-mortems that the incident timeline includes the internal alert notification time which is different than the public status page.

DanCvrcek · July 1, 2020, 5:01pm

thanks @jillian I was not aware of that! I have included a bit more detail from our logs in a blog post at https://blog.keychest.net/keep-an-eye-on-lets-encrypt-performance

DanCvrcek · July 1, 2020, 8:25pm

We have also started producing weekly reports - here’s an extracted chart of the latency over the last week. Each data point represents 100-120 transactions - this particular chart doesn’t show downtimes, it interpolates missing data.

system · July 31, 2020, 8:26pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance and Availability metrics for letsencrypt Help	9	655	October 24, 2020
LetsMonitor.org - free certificate monitoring needs beta testers for REST API Server	15	2920	March 28, 2017
Let's Encrypt Uptime - Comparing 2019 with 2016/17 Help	9	1472	December 24, 2019
LetsMonitor.org v2.0 - free certificate monitoring Server	46	10401	March 17, 2017
Let's Encrypt in numbers - limits, restrictions, features Server	16	12127	August 6, 2017

Monitoring of Let's Encrypt CA

Related topics