How to troubleshoot auto-renew failure


#1

Domains: sambidb.com, kizunadb.com, l4jp.com (the first two are wildcard certs)
I’m on a Linode VPS, with CentOS 7.4 and nginx 1.10.2, with my first experience using Let’s Encrypt. It’s now a couple days into the period when the auto-renew should have done something - the expiry is Jan. 31st. I’m still a complete newbie, so I don’t know how to troubleshoot what’s going on. Are there logs or something? I’ve looked but don’t see anything relevant.

When I first set it up, it was somewhat stubborn to get the TXT record for the verification because Linode is very slow to propagate (15-16 minutes, and even then, it doesn’t work every time), but it worked on about the third or fourth try for each domain. See this thread for the info about my setup, and note my last comment on that thread where I realize that I should already have what’s needed for renewal: No TXT record found (using Linode DNS plugin)

When I installed stuff, something (the Linode plugin?) created the following files for renewals:

/lib/systemd/system/certbot-renew.service

[Unit]
Description=This service automatically renews any certbot certificates found

[Service]
EnvironmentFile=/etc/sysconfig/certbot
Type=oneshot
ExecStart=/usr/bin/certbot renew $PRE_HOOK $POST_HOOK $RENEW_HOOK $CERTBOT_ARGS

/lib/systemd/system/certbot-renew.timer

[Unit]
Description=This is the timer to set the schedule for automated renewals

[Timer]
OnCalendar=daily
RandomizedDelaySec=6hours
Persistent=true

[Install]
WantedBy=timers.target

Running certbot certificates, I get this:

Found the following certs:
  Certificate Name: kizunadb.com
    Domains: kizunadb.com *.kizunadb.com
    Expiry Date: 2019-01-31 07:55:06+00:00 (VALID: 28 days)
    Certificate Path: /etc/letsencrypt/live/kizunadb.com/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/kizunadb.com/privkey.pem
  Certificate Name: l4jp.com
    Domains: l4jp.com
    Expiry Date: 2019-01-31 08:43:44+00:00 (VALID: 28 days)
    Certificate Path: /etc/letsencrypt/live/l4jp.com/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/l4jp.com/privkey.pem
  Certificate Name: sambidb.com
    Domains: *.sambidb.com
    Expiry Date: 2019-01-31 06:31:47+00:00 (VALID: 28 days)
    Certificate Path: /etc/letsencrypt/live/sambidb.com/fullchain.pem
    Private Key Path: /etc/letsencrypt/live/sambidb.com/privkey.pem

I don’t know how to determine if the renew service has been firing daily like it’s supposed to, and if so, what happened the last two days when it should have actually attempted a renewal. Can you point me to the right tools to diagnose the situation? (I have not attempted a manual renewal, because if it succeeded, I would have to wait another 60 days before I could work on this. If the time gets close I’ll do a manual one, but first I want to try getting auto working.)


#2

Hi,

Have you tried to run certbot renew by hand? (Because it’s hard to check what’s wrong without knowing if the renewal status would succeed.)

I guess there might be issues with your systemd timer?

Thank you


#3

Maybe you could try certbot renew --dry-run


#4

As I said, I didn’t want to do a manual renew because success would delay this troubleshooting for 60 days. But --dry-run was a nice idea. I got the following result:

Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/kizunadb.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert is due for renewal, auto-renewing...
Plugins selected: Authenticator dns-linode, Installer nginx
Starting new HTTPS connection (1): acme-staging-v02.api.letsencrypt.org
Renewing an existing certificate
Performing the following challenges:
dns-01 challenge for kizunadb.com
dns-01 challenge for kizunadb.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Waiting 1000 seconds for DNS changes to propagate
Waiting for verification...
Resetting dropped connection: acme-staging-v02.api.letsencrypt.org
Cleaning up challenges
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
new certificate deployed with reload of nginx server; fullchain is
/etc/letsencrypt/live/kizunadb.com/fullchain.pem
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/l4jp.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert is due for renewal, auto-renewing...
Plugins selected: Authenticator dns-linode, Installer nginx
Starting new HTTPS connection (1): acme-staging-v02.api.letsencrypt.org
Renewing an existing certificate
Performing the following challenges:
dns-01 challenge for l4jp.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Waiting 1000 seconds for DNS changes to propagate
Waiting for verification...
Resetting dropped connection: acme-staging-v02.api.letsencrypt.org
Cleaning up challenges
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
new certificate deployed with reload of nginx server; fullchain is
/etc/letsencrypt/live/l4jp.com/fullchain.pem
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/sambidb.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cert is due for renewal, auto-renewing...
Plugins selected: Authenticator dns-linode, Installer nginx
Starting new HTTPS connection (1): acme-staging-v02.api.letsencrypt.org
Renewing an existing certificate
Performing the following challenges:
dns-01 challenge for sambidb.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Waiting 1000 seconds for DNS changes to propagate
Waiting for verification...
Resetting dropped connection: acme-staging-v02.api.letsencrypt.org
Cleaning up challenges
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com
Starting new HTTPS connection (1): api.linode.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
new certificate deployed with reload of nginx server; fullchain is
/etc/letsencrypt/live/sambidb.com/fullchain.pem
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates below have not been saved.)

Congratulations, all renewals succeeded. The following certs have been renewed:
  /etc/letsencrypt/live/kizunadb.com/fullchain.pem (success)
  /etc/letsencrypt/live/l4jp.com/fullchain.pem (success)
  /etc/letsencrypt/live/sambidb.com/fullchain.pem (success)
** DRY RUN: simulating 'certbot renew' close to cert expiry
**          (The test certificates above have not been saved.)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

IMPORTANT NOTES:
 - Your account credentials have been saved in your Certbot
   configuration directory at /etc/letsencrypt. You should make a
   secure backup of this folder now. This configuration directory will
   also contain certificates and private keys obtained by Certbot so
   making regular backups of this folder is ideal.

So it appears that the renewal would have succeeded on all three domains if allowed to proceed manually. So what can I check about the automatic process? If it was a cron job, I would know enough about how that works to add some sort of logging to the script, but I don’t have a clue how the systemd method of scheduled tasks works.


#5

Do you have any Certbot logs in /var/log/letsencrypt?


#6

This is the crux of the problem…
Start with:
systemctl list-timers --all

And follow-up with some light reading: https://wiki.archlinux.org/index.php/Systemd/Timers


#7

Just from when I did things manually, not from when auto-renew should have been running every day. That’s why I didn’t know what was failing. BTW, would certbot renew run by systemd make an entry in the log even when there are no certs within 30 days of expiry, or only when it actually tries to do a renewal?

Please cut me a little slack. The Linode plugin installed those .timer and .service files, and the instructions said nothing about doing anything else with them. All other tutorials about setting up certbot auto-renew with systemd only talked about what needs to be in the files, and everyone in the other thread seemed satisfied that if I had those two files, things must be fine. Plus, it’s hard to know what terms to google when a concept is completely new. I’m a web developer, not a server admin. But thanks to you, I have a few more clues than I had yesterday.

When I first did that, it returned:

NEXT                         LEFT     LAST                         PASSED UNIT                         ACTIVATES
Sat 2019-01-05 01:07:10 JST  15h left Fri 2019-01-04 01:07:10 JST  8h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
n/a                          n/a      n/a                          n/a    systemd-readahead-done.timer systemd-readahead-done.service
2 timers listed.

Yeah, certbot-renew.timer was conspicuously not listed.

On that page I learned that a something managed by systemd is called a “unit”, and unlike cron jobs, a systemd timer unit apparently has to be “started”, but it was assumed I would know how to do that, so I had to follow up with heavier reading, following links from that page onward. Maybe I’m on the right track now…

$ systemctl start certbot-renew.timer
$ systemctl list-timers --all
NEXT                         LEFT     LAST                         PASSED UNIT                         ACTIVATES
Sat 2019-01-05 00:08:39 JST  13h left n/a                          n/a    certbot-renew.timer          certbot-renew.service
Sat 2019-01-05 01:07:10 JST  14h left Fri 2019-01-04 01:07:10 JST  9h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
n/a                          n/a      n/a                          n/a    systemd-readahead-done.timer systemd-readahead-done.service
3 timers listed.

Ah, that looks better. I assume “13h left” means that in 13 hours I’ll know if it worked. I’ll report back here.


#8

I think this critique needs to reach the people that made the plugin.

I believe it will make an entry on all runs.

Yes! You are getting somewhere now…

Yes, much better :slight_smile: and yes on the time left.
We should know more on your next report.


#9

Yes, the renewals succeeded! :grinning:

Last question: Now that the timer has been started and is running, will it continue/resume a running state if the server (or systemd or something else related) restarts, or will I have to run systemctl start again in some cases? I use Monit (although that is yet another tool I struggle to understand), and it would be great if I could get it to check up on the certs and send me an alert if they get too close to expiring (e.g. 15 days), but I doubt that kind of check is possible - I assume, though, that I could figure out how to get it to check up on the timer unit, since that’s more of a system thing.


#10

I believe systemctl enable should be used to ensure the timer is started again after a reboot.

I don’t know Monit (I use Icinga) but I found this so I guess it’s possible.


#11

Thanks - done.

That only alerts if the cert has already expired. I don’t need Monit for that - the users of my web apps would let me know! :flushed: I’m hoping for an alert when the expiry is getting close, not past.

I found a cloud-based free tool: https://certificatemonitor.org/ I’ll consider it, but it would send out way more notices than I need (starting at 3 months out!). I’ll keep looking for a local way to do it - if Monit can’t do it, perhaps a cron-based bash script or something.


#12

According to the linked documentation:

CERTIFICATE VALID for number DAYS. Send an alert if the certificate will expire in the given number of days. This test is pretty useful to get a notification when it is time to renew your SSL certificate.


#13

I have a phone that automatically rings whenever things go wrong! - LOL


#14

Oh, you fixed the anchor! The link in the StackOverflow answer is https://mmonit.com/monit/documentation/monit.html#CONNECTION-TESTING (rather than TESTS), which doesn’t exist, so it just went to the top of the manual, leaving me perplexed. Thanks for the correct link and pointing out what in that section is the relevant bit. Monit syntax confounds me, and full examples of this kind of test are almost nonexistent, but I found enough to write something. Unfortunately it’s not working, but that’s way off-topic here - if I can’t figure it out, I’ll ask on ServerFault or somewhere.