Unable to Renew or issue new certs - Killed

and · May 30, 2018, 7:01pm

I have about 100 domains I’m serving through apache. I’ve noticed that whenever I request a new cert or try to renew, it was quite slow and CPU usage from certbot would go up to 100%. After some time, I would get ‘Killed’ on my output. Any pointers for where to start? Logs at /var/log/letsencrypt/letsencrypt.log don’t show much.

My domain is:

I ran this command:
sudo certbot renew

It produced this output:

Cert is due for renewal, auto-renewing…
Killed

My web server is (include version):
apache/2.4.18 (Ubuntu)

The operating system my web server runs on is (include version):
ubuntu 16.04.4 LTS

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don’t know):
yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel):
no

stevenzhu · May 30, 2018, 7:42pm

Hi,

What’s your server config? (I’m suspecting there’s not enough cpu to run… But that’s unlikely)

@cpu

Thank you

schoen · May 30, 2018, 7:47pm

Do you suppose that @cpu can provide everyone with more CPU?

and · May 30, 2018, 7:49pm

@stevenzhu which server config file are you asking for or suggesting I look at?

and · May 30, 2018, 7:52pm

For reference, server itself is a c5.xlarge from amazon web services https://aws.amazon.com/ec2/instance-types/c5/

stevenzhu · May 31, 2018, 4:07pm

I actually want to have some free RAM and large ssd

@and then it’s out of my knowledge refused how certbot can reach 100% CPU usage.

@cpu can you please take a look at this…

Thank you

cpu · May 31, 2018, 4:08pm

I'm afraid I don't have any ideas about what could cause Certbot's CPU usage to spike (despite my chosen nick perhaps indicating otherwise!). @schoen or @bmw are probably best positioned to debug.

schoen · May 31, 2018, 4:46pm

It would be good to see some of the logs from /var/log/letsencrypt; another thought is to increase Certbot’s verbosity with -v options (although I’m concerned that that may show more about network communications rather than about Certbot’s own actions). If we don’t learn anything from that, we can try to think of other debugging options.

This might be an example

https://docs.python.org/2/library/trace.html

as Certbot could be run through the trace module. Or again

https://docs.python.org/2/library/profile.html

Using one of these, we could get more detailed low-level information about what Certbot was doing.

and · June 1, 2018, 4:22pm

@schoen Thanks for your suggestions

The log looks like this:

2018-06-01 15:19:10,898:DEBUG:certbot.storage:Should renew, less than 30 days before certificate expiry 2018-06-29 23:35:45 UTC.
2018-06-01 15:19:10,899:INFO:certbot.renewal:Cert is due for renewal, auto-renewing...
2018-06-01 15:19:10,899:DEBUG:certbot.plugins.selection:Requested authenticator webroot and installer apache
2018-06-01 15:19:11,149:DEBUG:certbot_apache.configurator:Apache version is 2.4.18

That’s it, I think that certbot gets killed before it outputs anything helpful

I’m somewhat a python novice, I’m really not sure how to run certbot with the trace or profile libraries, I built a script that looks like this:

from subprocess import call
call(["certbot", "renew"])

which I ran like this:

python -m trace --count -C . renew.py

which generated a lot of files

pickle.cover  re.cover  renew.cover  sre_compile.cover  sre_parse.cover  struct.cover  subprocess.cover  trace.cover

Where can I go from here?

schoen · June 1, 2018, 4:28pm

and:

I’m somewhat a python novice, I’m really not sure how to run certbot with the trace or profile libraries, I built a script that looks like this:
from subprocess import call
call(["certbot", "renew"])
which I ran like this:
python -m trace --count -C . renew.py

That's a good thought but unfortunately the -m trace isn't going to survive the subprocess.call operation, because the subprocess.call will make the operating system end up starting a fresh copy of Python which doesn't know about the -m option. Therefore, your existing trace files refer only to the process of running certbot renew, rather than to actions that it took.

In order to get the trace for Certbot itself, you would have to run Certbot itself under a Python interpreter that has -m trace. I suspect you could accomplish this with something like

python -m trace --count -C . $(which certbot) renew

I think the profiler output might be more relevant than the trace output, but getting the profiler output might also be a little more work, so maybe we should start with the trace output.

and · June 1, 2018, 4:39pm

This gave me:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/trace.py", line 819, in <module>
    main()
  File "/usr/lib/python2.7/trace.py", line 807, in main
    t.runctx(code, globs, globs)
  File "/usr/lib/python2.7/trace.py", line 513, in runctx
    exec cmd in globals, locals
  File "/usr/bin/certbot", line 11, in <module>
    load_entry_point('certbot==0.22.2', 'console_scripts', 'certbot')()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 561, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 553, in get_distribution
    dist = get_provider(dist)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 427, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 715, in find
    raise VersionConflict(dist, req)
pkg_resources.VersionConflict: (certbot 0.19.0 (/usr/lib/python2.7/dist-packages), Requirement.parse('certbot==0.22.2'))

I'm not sure why there is a reference to certbot 0.19.0, I believe I attempted to upgrade it to the latest while I was running into this issue

certbot --version returns

certbot 0.22.2

schoen · June 1, 2018, 4:51pm

I forgot how this is supposed to work, but maybe try with python3?

and · June 1, 2018, 5:56pm

This worked

python3 -m trace --count -C . $(which certbot) renew

But output was the same, ‘Killed’, and no files were created

schoen · June 1, 2018, 7:05pm

Hmmm, I wonder if the trace module only creates the files upon a successful exit?

I just tried this by writing a program that sends itself a SIGKILL (via import os; os.kill(os.getpid(), 9)) and it indeed didn’t give any trace output.

I’ll check whether cProfile has a similar or a different behavior.

schoen · June 1, 2018, 9:12pm

By the way, could you try running ulimit -a to see if you have a per-process CPU-time limit? You might be able to temporarily remove that limit if it’s the reason that the process is getting killed.

For example,

ulimit -t 1; echo 'scale=100000; 4*a(1)' | bc -l

results in Killed (the bc process will receive SIGKILL when it takes more than 1 second of total CPU time).

schoen · June 1, 2018, 10:24pm

I’ve confirmed that the cProfile module has the same behavior (if the Certbot process is killed while under profiling, no profiling statistics are reported). So, the question about the ulimit might really be relevant because we might need to stop Certbot from getting killed in order to get trace or profile data out.

and · June 11, 2018, 12:47pm

ulimit -t returns unlimited, so doesn’t look like it is the OS killing the process. I believe that one of the domains was hanging the process. I am not sure why, but, I went through my list of expiring domains and cleared through them using the certonly option to renew each one individually and was able to work through my backlog. I’m also using that on new domains as well (rather than the apache2 installer)

schoen · June 11, 2018, 4:44pm

I wonder if it was a different resource—maybe I should have suggested ulimit -a.

It’s a pity that the Unix architecture doesn’t provide us a way to get a more specific error when the OS kills a process based on a resource limit.

and · June 11, 2018, 5:24pm

Any hunches as to what might be the next best place to look?

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30464
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30464
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

schoen · June 11, 2018, 6:16pm

Huh, none of those look particularly bad or unusual to me!

Topic		Replies	Views
Certbot renew not working? Help	28	888	February 7, 2023
Certbot auto renew kills apache Help	17	3101	February 26, 2019
Certbot killed immediately after starting Help	56	1481	November 16, 2022
Certbot renewals have suddenly started failing Help	45	1580	September 30, 2021
Certbot requires _lots_ of memory in Ubuntu 16.04 Server	13	3915	July 9, 2017

Unable to Renew or issue new certs - Killed

It produced this output:

Related topics