I have about 100 domains I’m serving through apache. I’ve noticed that whenever I request a new cert or try to renew, it was quite slow and CPU usage from certbot would go up to 100%. After some time, I would get ‘Killed’ on my output. Any pointers for where to start? Logs at /var/log/letsencrypt/letsencrypt.log don’t show much.
My domain is:
I ran this command:
sudo certbot renew
It produced this output:
Cert is due for renewal, auto-renewing…
Killed
My web server is (include version):
apache/2.4.18 (Ubuntu)
The operating system my web server runs on is (include version):
ubuntu 16.04.4 LTS
My hosting provider, if applicable, is:
I can login to a root shell on my machine (yes or no, or I don’t know):
yes
I’m using a control panel to manage my site (no, or provide the name and version of the control panel):
no
I'm afraid I don't have any ideas about what could cause Certbot's CPU usage to spike (despite my chosen nick perhaps indicating otherwise!). @schoen or @bmw are probably best positioned to debug.
It would be good to see some of the logs from /var/log/letsencrypt; another thought is to increase Certbot’s verbosity with -v options (although I’m concerned that that may show more about network communications rather than about Certbot’s own actions). If we don’t learn anything from that, we can try to think of other debugging options.
2018-06-01 15:19:10,898:DEBUG:certbot.storage:Should renew, less than 30 days before certificate expiry 2018-06-29 23:35:45 UTC.
2018-06-01 15:19:10,899:INFO:certbot.renewal:Cert is due for renewal, auto-renewing...
2018-06-01 15:19:10,899:DEBUG:certbot.plugins.selection:Requested authenticator webroot and installer apache
2018-06-01 15:19:11,149:DEBUG:certbot_apache.configurator:Apache version is 2.4.18
That’s it, I think that certbot gets killed before it outputs anything helpful
I’m somewhat a python novice, I’m really not sure how to run certbot with the trace or profile libraries, I built a script that looks like this:
from subprocess import call
call(["certbot", "renew"])
That's a good thought but unfortunately the -m trace isn't going to survive the subprocess.call operation, because the subprocess.call will make the operating system end up starting a fresh copy of Python which doesn't know about the -m option. Therefore, your existing trace files refer only to the process of runningcertbot renew, rather than to actions that it took.
In order to get the trace for Certbot itself, you would have to run Certbot itself under a Python interpreter that has -m trace. I suspect you could accomplish this with something like
I think the profiler output might be more relevant than the trace output, but getting the profiler output might also be a little more work, so maybe we should start with the trace output.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/trace.py", line 819, in <module>
main()
File "/usr/lib/python2.7/trace.py", line 807, in main
t.runctx(code, globs, globs)
File "/usr/lib/python2.7/trace.py", line 513, in runctx
exec cmd in globals, locals
File "/usr/bin/certbot", line 11, in <module>
load_entry_point('certbot==0.22.2', 'console_scripts', 'certbot')()
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 561, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 553, in get_distribution
dist = get_provider(dist)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 427, in get_provider
return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 715, in find
raise VersionConflict(dist, req)
pkg_resources.VersionConflict: (certbot 0.19.0 (/usr/lib/python2.7/dist-packages), Requirement.parse('certbot==0.22.2'))
I'm not sure why there is a reference to certbot 0.19.0, I believe I attempted to upgrade it to the latest while I was running into this issue
Hmmm, I wonder if the trace module only creates the files upon a successful exit?
I just tried this by writing a program that sends itself a SIGKILL (via import os; os.kill(os.getpid(), 9)) and it indeed didn’t give any trace output.
I’ll check whether cProfile has a similar or a different behavior.
By the way, could you try running ulimit -a to see if you have a per-process CPU-time limit? You might be able to temporarily remove that limit if it’s the reason that the process is getting killed.
For example,
ulimit -t 1; echo 'scale=100000; 4*a(1)' | bc -l
results in Killed (the bc process will receive SIGKILL when it takes more than 1 second of total CPU time).
I’ve confirmed that the cProfile module has the same behavior (if the Certbot process is killed while under profiling, no profiling statistics are reported). So, the question about the ulimit might really be relevant because we might need to stop Certbot from getting killed in order to get trace or profile data out.
ulimit -t returns unlimited, so doesn’t look like it is the OS killing the process. I believe that one of the domains was hanging the process. I am not sure why, but, I went through my list of expiring domains and cleared through them using the certonly option to renew each one individually and was able to work through my backlog. I’m also using that on new domains as well (rather than the apache2 installer)