Best practices to monitor Certbot

toc-rox · September 12, 2018, 2:04pm

What are the best practices to monitor Certbot (as foreground and background process)?

E.g. how to collect monitor information concerning

general activities
connectivity to ACME server
issuance of certificate
installation of certificate
…

stevenzhu · September 12, 2018, 2:06pm

Take a look plz.

cpu · September 12, 2018, 2:22pm

@stevenzhu I don’t think this is a question that merits a staff tag. It seems like something the community can provide guidance on. Please try to reserve the @lestaff mentions for issues that require privileged access or immediate attention. Thanks!

@toc-rox What context are you using Certbot in? For a small number of websites? As part of some kind of larger automation? I think the answer to that question will help inform the responses you get about best practices.

toc-rox · September 13, 2018, 9:12am

The plan is to use Certbot in a production environment with a lot of servers. It’s very important
that a certificate never becomes invalid. To assure this, it’s necessary to monitor each Certbot instance.
Currrenty it’s not obvious what’s the best practise to achieve this:

evaluate the return code
evaluate the (newest) logfile
evaluate the terminal output
use a combination of the above
something else

_az · September 13, 2018, 9:24am

I’m of the view that it’s usually a mistake to micromanage ACME clients. They are not usually built to produced structured/parseable output or logs (Certbot certainly isn’t) and … well, who cares if one renewal attempt finishes with exit code 1 instead of 0, it could have been an intermittent issue (e.g. network or CA degraded) and succeed the next time.

The outcome is what matters (certificates not lapsing or getting so close to lapsing that you’re likely to have rate limiting problems).

Additionally, Certbot’s report of success can be a false negative. There is no guarantee that e.g. your webserver actually loaded and is actively serving the new certificate after it was renewed. Certbot doesn’t check that.

To that end, you have some tools at your disposal:

Rely on the CA-based email notifications (least accurate)
Rely on a service like Let’s Monitor or Uptime Robot to warn you when a particular live endpoint is observed to be serving a certificate that is close to lapsing
Leverage existing monitoring infrastructure (Nagios or a Prometheus exporter) to do the same thing

tdelmas · September 13, 2018, 9:45am

If you monitor the expiration date of your certificate effectively sent by your web-server then you covers all these issues:
If certbot works, it should never expires in less than X days (X=30 usually).
If any of these errors occurred:

Failed to get a new certificate
Failed to install the new certificate

Then your monitor will see it.

JuergenAuer · September 13, 2018, 10:53am

Hi @toc-rox

then you may create your own client instead of installing certbot. There are a lot of libraries. And ACME isn't too complicated:

With an own client, you may

redirect all GET requests domainname/.well-known/acme-challenge/1234 to a special server
send notification mails if something doesn't work
split the certificate creation / certificate management and the installation / use of certificates

And you can split the creation of a new certificate in small steps with return codes. So if there is an error, you don't need to restart with a new order, instead repeat the last step.

toc-rox · September 13, 2018, 2:18pm

I agree that bottum-up monitoring isn’t sufficient. I have written a top-down data collector (prototype) which grabs the certificate offered by a service. This makes it possible to evaluate the certificate used by the certificate consumer (service). An alarm good be generated if the remaining lifetime is under a defined threshold. This indicates that something in the renewing chain hasn’t worked.

$ ./moncert www.google.com:443

Connecting to "www.google.com:443" ...

SerialNumber : 8030173536167869905
Subject      : CN=www.google.com,O=Google LLC,L=Mountain View,ST=California,C=US
Issuer       : CN=Google Internet Authority G3,O=Google Trust Services,C=US
NotBefore    : 2018-08-21 08:05:00 +0000 UTC
NotAfter     : 2018-11-13 08:05:00 +0000 UTC
IsCA         : false
DNSNames     : www.google.com

SerialNumber : 149685795415515161014990164765
Subject      : CN=Google Internet Authority G3,O=Google Trust Services,C=US
Issuer       : CN=GlobalSign,OU=GlobalSign Root CA - R2,O=GlobalSign
NotBefore    : 2017-06-15 00:00:42 +0000 UTC
NotAfter     : 2021-12-15 00:00:42 +0000 UTC
IsCA         : true

toc-rox · September 13, 2018, 2:41pm

Why writing a new client? Certbot is an excellent one. Isn’t it better to place a feature request against Certbot. Something like “implement reliable and solid monitor messages”.

JuergenAuer · September 13, 2018, 10:07pm

Certbot isn't an api and doesn't want to be an api. So it's always a new process required. You said:

evaluate the return code
evaluate the (newest) logfile

These are additional steps. A library with some functions has direct return- and errorcodes, so it's not required to parse logfiles. If Certbot changes something, such a validation may not longer work.

And there are other limitations. My own client uses dns-01 validation with *.example.com and http-01-validation with example.com, so I need only one _acme-challenge.example.com - entry, not two. Such "mixed validations" aren't supported. And I can save the http-01-validation-files in a special directory as

domainname.token.txt

If there is a GET http://domainname/.well-known/acme-challenge/token, the code of the webserver checks, if there is such a file domainname.token.txt - if yes, it is sent, if no, a 404 is sent.

A lot of work is the organization of local informations: Account-key, order-url, validation files, certificate keys, certificate requests and certificates. These can be saved in a database, as files - with own functions you can do what you want. The communication with Letsencrypt is only a small part of the job.

So it's easy to create additional mail notifications if something doesn't work.

mnordhoff · September 14, 2018, 5:52am

It may be feasible to run “certbot -q renew” and, if it outputs anything, generate an alert. You will waste some time slogging through emails about transient errors, but maybe not too much time.

You still have to monitor your web servers to make sure everything’s really working, though.

Also, there’s the issue of monitoring revocation status of your certificates. (Which could be tied into an OCSP stapling implementation.)

toc-rox · September 23, 2018, 1:38pm

I have written a helper tool for monitoring the certificate validity. The utility (certstate) in written in Go can be found here:

Binaries for Linux, macOS and Windows are available. Feedback is welcome …

Topic		Replies	Views
LetsMonitor.org - free certificate monitoring needs beta testers for REST API Server	14	3012	February 26, 2017
Monitoring the state of certificates Server	7	17723	February 22, 2018
Certbot etc dashboarding and failure reporting? Client dev	5	1522	July 5, 2022
Monitoring the state of certificates (cont.) Server	2	5877	July 9, 2017
Certificate Expiry Monitor Server	8	6978	July 4, 2017

Best practices to monitor Certbot

Related topics