Identify if there are any renewal failures in the daily cron job

My server: Ubuntu 18.04 LTS Apache/2.4.29 (Ubuntu)
Rackspace: shell access

I am following the trail of the certbot cron job
/etc/cron.d/certbot

....
# certbot.timer.
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

0 */12 * * * root test -x /usr/bin/certbot -a \! -d /run/systemd/system && perl -e 'sleep int(rand(43200))' && certbot -q renew

Saving debug log to /var/log/letsencrypt/letsencrypt.log

I am trying to find a way to identify if there are any renewal failures in the daily cron job

One thing I can do is check the log

/var/log/letsencrypt/letsencrypt.log

for the string 'certbot.renewal:no renewal failures'


But sometimes the letsencrypt log only has items from the following ( and does not include string no renewal failures)

I use this command in my daily maintenance routines
certbot certificates > Certs_txt

Since I have 500 domains, this gets big and the debug log gets big.
I changed the letsencrypt log rotation to be daily instead of weekly

I went to github and found this \certbot\renewal.py

if renew_failures or parse_failures:
    raise errors.Error("{0} renew failure(s), {1} parse failure(s)".format(
        len(renew_failures), len(parse_failures)))
else:
    logger.debug("no renewal failures")

**If there are renewal failures, can I search for 'renew failure(s)' in the log to check for errors? **
Another way to ask the questions is does raise errors.Error( write to the log file?

1 renew failure(s), 0 parse failure(s)

What about searching for the string 'All renewal attempts failed' in the log

Here are some posts that show that string

All renewal attempts failed. The following certs could not be renewed:
/etc/letsencrypt/live/bibidiart.com/fullchain.pem (failure)
*** DRY RUN: simulating ā€˜certbot renew’ close to cert expiry*
*** (The test certificates above have not been saved.)*

If there are other ways I could check for renewal errors, Please advise

1 Like

Just an observation and maybe a question:
image
Since the use of the -q option should already minimize the output to only errors…
Why/How are the logs ā€œdifficultā€ to handle?

1 Like
  1. When you have 500 domains for 1 ip address, the logs get rather large. Sometimes I am renewing 90 or so certificates at a time. I am also adding and deleting certificate, as needed. I did not say difficult.

  2. Stated another way, my question was, is there something I can search for in the log that will definitely tell me if I had renew failures. Since I have never had a renew failure, and there are not examples around, I looked at the code in \certbot\renewal.py

At different times of the day, the log has different entries. If I search for ā€œno renewal failuresā€, and the letsencrypt cron job that runs renewals has not run, I will not find that string. So I have to add logic to check to see if there was a renewal run. So I can search for ā€˜Cert is due for renewal’ . I if have that, then I can search for ā€œno renewal failuresā€.

If by some stroke of luck, there was an entry written to the log that said there were renewal failures, I could just search for that. That seems a little cleaner to me.

I am not a python guy. I was hoping someone could tell me if

if renew_failures or parse_failures:
raise errors.Error("{0} renew failure(s), {1} parse failure(s)".format(
len(renew_failures), len(parse_failures)))
else:
logger.debug(ā€œno renewal failuresā€)

wrote something to the log file, if there was a failure

  1. I did not want to change anything that was provided by letsencrypt. They provided the /etc/cron.d/certbot job.

  2. If it were me coding this, I would put a line in the log for both certbot.renewal:no renewal failures and something like ā€œyou had renewal failuresā€. Never having had a renewal failure I can’t tell from the code above whether I can expect to see something in the log.

  3. I wish I had a test system. I don’t. However, I have had a few glitches along the way, and the folks in this community have helped me solve every problem.

  4. Would you agree that if your going to write a msg to the log that says, no failures, that you should also write a msg to the log that said you have failures? I wrote all of this because maybe at some point this might help someone else who uses non interactive jobs. I could have put in a fix that probably would handle my issue in less time than it took to write this. However, If I could rely on the log having a ā€œthere is a failureā€ msg, it would be somewhat cleaner.

1 Like

First stab in php of identifying it a renew was attempted and if it log has certbot.renewal:no renewal failures

function gjcheckRenews($strin) {
global $gjFatalMsg;
global $gjFatal;
$gjmsgtmp=’’;

//Cert is due for renewal, auto-renewing...
//Cert not yet due for renewal

// if certbot cron $IsRenewRun === true and no renewal failures === true okay


$IsRenewRun = stripos($strin,'renewing');
if($IsRenewRun === FALSE){
	$gjmsgtmp .= "\nNot a Renewal run\n" . PHP_EOL;
	gjToFile2($gjmsgtmp);
	return TRUE;
}

$IsRenewOK = stripos($strin,'no renewal failures');
if($IsRenewOK === FALSE){
	$gjmsgtmp .= "\nTHERE ARE Renew Errors\n" . PHP_EOL;
	gjToFile2($gjmsgtmp);
	return FALSE;
} else {
	$gjmsgtmp .= "\nNo Renew Errors\n" . PHP_EOL;
	gjToFile2($gjmsgtmp);
	return TRUE;
}

}

1 Like

Yes it will. That log line will look like:

certbot.errors.Error: 1 renew failure(s), 2 parse failure(s)

As a word of caution though, while Certbot has had this exact output since 2016 and we have no plans to change it, I wouldn't consider individual log lines like this as something that is "stable" and will never change in the future.

I think better options for you here are:

  1. Assuming you gave your email address to Certbot, Let's Encrypt will email you weeks prior to your certificates expiring. You could rely on receiving those emails to know there was a problem renewing your certificate.
  2. If certbot renew fails for some reason, it will exit with a nonzero error code. Assuming you haven't ripped systemd out of Ubuntu 18.04 (you'd know it if you did), the cronjob you see has no effect and Certbot instead is run through a systemd timer and service. You can configure systemd to email you when this happens (without modifying the files provided by the Certbot package) or keep an eye on system logs for failing services. You can find guides on how to do this online.
2 Likes

I thought that renewal failures weren't a kind of error that produces a nonzero exit. Am I mistaken about that?

I think you are mistaken. I’m not aware of any code in Certbot that would prevent this and running certbot renew locally and triggering that exception caused Certbot to exit with a nonzero status.

Thank you, to all who have replied. I understand the warning that these log lines may change in the future.

  1. thanks for the heads up on the cron job.
    I would like to change and control the time that the renew job is run. Today it ran at around 8:00 central time.

  2. I use the command certbot certificates > Certs_txt and then I look in the Certs.txt file to get the days left in the VALID: line. Example.

Found the following certs:
Certificate Name: adelphiachiropracticblog.com
Domains: adelphiachiropracticblog.com www.adelphiachiropracticblog.com
Expiry Date: 2019-11-10 14:45:55+00:00 (VALID: 85 days)
Certificate Path: /etc/letsencrypt/live/adelphiachiropracticblog.com/fullchain.pem
Private Key Path: /etc/letsencrypt/live/adelphiachiropracticblog.com/privkey.pem
Certificate Name: advancedchiropracticofphiladelphiablog.com
Domains: advancedchiropracticofphiladelphiablog.com

…
One of things I do is look for certs about to renew < less than 31 days so I can see when they are renewing. I also track if any certs are less than 28 days. My assumption is that if certbot looks at certs to renew at 30 days, if they still have not renewed and show less than 30, that they had a problem renewing. And that it would show up with that command.

  1. I hope I made the case that for jobs especially jobs that can be run non interactively, having clear and concise detail and summary entries in the log is really useful.

  2. Letsencrypt has made this process straightforward. Thanks. For the moment, I am going to abandon my attempts to read the log file to check on renewal failures, and instead rely on checking the cert renewal date is less then 28 days that I get from the certbot certificates command. However, I am always open to a better ideas.

  3. Writing a hook for the renewal process is probably the way to go. However, setting up a test environment etc is a significant amount of work, especially when you need to test various DNS type problems.

2 Likes

I'm pretty sure they can show "29 days" depending on the exact time of day the certificate is going to expire and the cron job will run.

Your plan to check for 27-28 days sounds good to me.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.