Certbot etc dashboarding and failure reporting?

webprofusion · June 30, 2022, 3:43am

I'm exploring the idea of extending the https://certifytheweb.com dashboarding and renewal failures notifications reporting to support Certbot (and possibly acme.sh). It's clear that some users don't always have visibility on what renewals they have and which (if any) are failing.

So a few general questions for people who have a need for this sort of thing:

does failure reporting/notifications sound useful to you?
What post-request reporting/notifications/webhooks etc do you currently use (if any) and why?
Is there an existing open source or commercial solution you are already using and what features does it have that are important to you?
Do you require an API to query this information?

Thanks for any input!

_az · June 30, 2022, 4:56am

Would endpoint monitoring be in scope for this?

For a while I had an acme.sh installation where my post-hook to reload my ZNC server's certificate wasn't doing the job and I only detected it because I was monitoring the endpoint directly with Uptime Robot.

Detecting those kinds of incongruities would be useful for me, because it gives reliable assurances that the certificate is really OK.

webprofusion · June 30, 2022, 5:08am

Yep, could be. Depends a little on whether the endpoint is internal or public and what type of service it is. General purpose TLS service monitoring/consistency reporting could be interesting.

jvanasco · June 30, 2022, 1:04pm

We played with failure reporting, but didn’t release it yet. We still log the errors but don’t generate the reports. We set things up so failures are centrally contained within the logs, which are sql based, so they are queryable. I think there is a Boolean flag on the final failure records.

What was interesting was to track rate limit errors. The idea was to catch certain errors early - like duplicate certs, too many failures, too many pending authz - and either warn or pause operations. If you hit some of these, something is broken and needs to be fixed and/or the account will be wedged for a bit.

Most of the other errors would chain back to a connectivity issue or acme server outage. I thought the rate limiting stuff had the most potential.

webprofusion · July 1, 2022, 12:10am

Thanks, yes that's a good point, I guess if you know the error (and the CA) you could pretty much determine when rate limits are likely to happen.

jvanasco · July 5, 2022, 4:30pm

Yeah. My goal was to stay within 80% of rate limits by default, so there is always a bit of extra room to ensure specific operations can successfully complete. So I tried to log every ACME request and error, so we can generate real-time stats and keep each account/ip in a healthy place.

system · August 4, 2022, 4:30pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Certify The Web - Dashboard for Certbot, acme.sh etc Client dev	6	208	October 12, 2024
Best practices to monitor Certbot Help	12	3457	November 4, 2018
Certbot renew failure notification? Server	8	3374	February 2, 2019
Certbot and multiple/fail-over ACME servers Feature Requests	17	1581	September 16, 2021
Certbot renew as a cronjob Server	2	2064	September 26, 2016

Certbot etc dashboarding and failure reporting?

Related topics