Deploy hook failure doesn't make certbot renew command fail

Am I correct that when certbot renew is run with --deploy-hook option and the script in the hook fails (exit code != 0), then certbot still exits successfully (wih exit code 0)?

If that's correct - does it have any reason?

I am currently parsing certbot renew command output just to find out if it successfully uploaded my certificate to Google Cloud just because if the upload script in deploy hook fails, I don't find it out from exit code of certbot.

Example:
certbot renew --cert-name my.domain.com --deploy-hook "false"

Expected exit code: 1
Got exit code: 0

certbot 1.9.0

2 Likes

Yes. The only thing Certbot will do in these cases is print a warning along with its error output.

A singular exit code can be a little complicated.

For example, if multiple certificates are renewing at the same time, but only one of their --deploy-hooks fail, it's impossible to convey that using only an exit code. One could make the argument that the exit code should be non-zero if there are any failures, but it has the same problem of being all-or-nothing. (And I think probably it should have been done that way to begin with).

If anybody has good ideas on improving the handling of failed hooks, we'd be happy to hear them.

4 Likes

Also with --quiet enabled? B/c I recon you'd like to know something goes wrong in a cronjob.

Does certbot get a non-zero exit code too, when a hook fails?

3 Likes

Yes, it will appear in cron output even with --quiet.

No, which is what I think OP is asking about.

4 Likes

I think that exit codes are pretty clear in definition. 0 code means success, all other exit codes are error ones. If there is a part of certbot run that failed (the hook) I would expect certbot not to exit with 0.

And cron is exactly the place where it is important. Nobody checks the output when everything goes well. You usually check it after you get notified about the failure (triggered by the exit code).

If not implemented the way I described here it forces users to parse the certbot output to check whether certificates were actually deployed and that's pretty ugly, prone to errors etc.

2 Likes

Is this the case?

cron sends any output regardless of exit code:

When executing commands, any output is mailed to the owner of the crontab (or to the user named in the MAILTO environment variable in the crontab, if such exists).

So in the case of cron, --deploy-hook erroring would come to the user's attention. At least, if they used the cronjob that Certbot comes packaged with, or follow the official instructions (basically certbot renew -n -q). If Certbot exited with a non-zero code, I don't think it would make a material difference.

On the other hand, most modern systems are probably using systemd timers by now, such as that which ships with the Certbot snap or with the Debian package. However, it seems that systemd does not make any effort to notify the user whatsoever. While researching it I found a blog post complaining about it in the context of Certbot: https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdTimersAndErrors. Maybe we should do something about that.

@scr4bble why would an exit code work better for you? Do you have a custom cronjob that is checking the exit code? I think your request is pretty reasonable (filed https://github.com/certbot/certbot/issues/8528) but it'd be good to understand the problem fully.

2 Likes

Yes, sorry, I could have clarified it better. I am running the script as part of anacron (script located in /etc/cron.daily/) and it always appends it's output to a specific log file 2>&1 >> ${CERTBOT_OUTPUT_LOG_FILE} and only notifies my when something went wrong (while pointing me to look into the log file).

if [ "$?" -ne 0 ]; then
    echo "Error while renewing certificate *.domain.xxx check ${CERTBOT_OUTPUT_LOG_FILE}" | mail -s "[cron] GCP SSL cert renewal error" nickname@email.xyz
    exit 1
fi

You are right that if I used the oficial recommended way (I am not sure if I knew about it or I decided to do it differently because of security - preventing timed attacks, more flexibility etc.) but that's probably not important now.

Just to mention an important thing: Certain gcloud (google cloud CLI) commands produce some of their output into stderr despite finishing successfully. So if certbot always displays stderr of the deploy-hook no matter its exit code, it would always email me unless I silent all the stderr-producing commands in the hook. That's also the reason why exit code is always more reliable.

Regarding the systemd - I think that's like using a machine gun to kill some ants. cronjob is probably more suitable for this task but that's my personal opinion (mailing the errors is just one of the reasons).

Thanks for bringing the issue forward! :slight_smile:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.