Certificate renewal kills apache2


#1

Hello all,

I have this problem which is literally killing my website! If anyone could provide any insights or ideas for how to fix it, I would be very happy and the world would be just a little bit improved and closer to perfection.

This is the line in my certbot crontab:

0 */12 * * * root test -x /usr/bin/certbot -a ! -d /run/systemd/system && perl -e ‘sleep int(rand(3600))’ && certbot -q renew

This works - it has renewed the certificate twice now since I started using Let’s Encrypt and certbot last autumn. However, both times it has killed apache2 in the process and taken my website offline for hours, as it happened just after midnight and I didn’t see it until the morning after. A simple “service apache2 start” was all that was needed to get it back up again.

This is what I found in my apache2 error_log:


[Thu Dec 08 07:35:06.232634 2016] [mpm_event:notice] [pid 2178:tid 139896047458176] AH00489: Apache/2.4.10 (Debian) mod_fcgid/2.3.9 OpenSSL/1.0.1t configured – resuming normal operations
[Thu Dec 08 07:35:06.232730 2016] [core:notice] [pid 2178:tid 139896047458176] AH00094: Command line: ‘/usr/sbin/apache2’
[Thu Dec 08 07:35:06.232784 2016] [mpm_event:warn] [pid 2178:tid 139896047458176] AH00488: long lost child came home! (pid 2763)
[Fri Dec 09 00:00:13.349437 2016] [mpm_event:notice] [pid 2178:tid 139896047458176] AH00493: SIGUSR1 received. Doing graceful restart
AH00112: Warning: DocumentRoot [/var/lib/letsencrypt/tls_sni_01_page/] does not exist
AH00112: Warning: DocumentRoot [/var/lib/letsencrypt/tls_sni_01_page/] does not exist
AH00112: Warning: DocumentRoot [/var/lib/letsencrypt/tls_sni_01_page/] does not exist
AH00112: Warning: DocumentRoot [/var/lib/letsencrypt/tls_sni_01_page/] does not exist
AH00112: Warning: DocumentRoot [/var/lib/letsencrypt/tls_sni_01_page/] does not exist
AH00112: Warning: DocumentRoot [/var/lib/letsencrypt/tls_sni_01_page/] does not exist
[Fri Dec 09 00:00:16.469696 2016] [ssl:warn] [pid 2178:tid 139896047458176] AH01906: 5f082f18ff15ddf96fa74c9af14e4425.f7daa04b8c98b9e84500b7d0b829ac5d.acme.invalid:443:0 server certificate is a CA certificate (B
asicConstraints: CA == TRUE !?)
[Fri Dec 09 00:00:16.470568 2016] [ssl:warn] [pid 2178:tid 139896047458176] AH01906: e86f9a1ee1d5620f5c41eed9ce3e0ce7.c8e4571843b2316035378df41c56096e.acme.invalid:443:0 server certificate is a CA certificate (B
asicConstraints: CA == TRUE !?)
[Fri Dec 09 00:00:16.471223 2016] [ssl:warn] [pid 2178:tid 139896047458176] AH01906: 885189d07d477a573963088662ee559a.2883c434b72bc36e444400f4caf8dde2.acme.invalid:443:0 server certificate is a CA certificate (B
asicConstraints: CA == TRUE !?)
[Fri Dec 09 00:00:16.471773 2016] [ssl:warn] [pid 2178:tid 139896047458176] AH01906: 2cf3fc76791419925597c71d54b1d3bd.83b203b11c19d51906e51daeb86ca70d.acme.invalid:443:0 server certificate is a CA certificate (B
asicConstraints: CA == TRUE !?)
[Fri Dec 09 00:00:16.472356 2016] [ssl:warn] [pid 2178:tid 139896047458176] AH01906: e52890d0353fdbd7b6f38666d8121005.3d8cb0183422290745d8f2153ddf0df2.acme.invalid:443:0 server certificate is a CA certificate (B
asicConstraints: CA == TRUE !?)
[Fri Dec 09 00:00:16.472968 2016] [ssl:warn] [pid 2178:tid 139896047458176] AH01906: 3662d4183957aa0b880ee3caabaacc97.87c06ad1439d787a222805810e06768e.acme.invalid:443:0 server certificate is a CA certificate (B
asicConstraints: CA == TRUE !?)
[Fri Dec 09 00:00:16.478099 2016] [mpm_event:notice] [pid 2178:tid 139896047458176] AH00489: Apache/2.4.10 (Debian) mod_fcgid/2.3.9 OpenSSL/1.0.1t configured – resuming normal operations
[Fri Dec 09 00:00:16.478176 2016] [core:notice] [pid 2178:tid 139896047458176] AH00094: Command line: ‘/usr/sbin/apache2’
[Fri Dec 09 00:00:16.478240 2016] [mpm_event:warn] [pid 2178:tid 139896047458176] AH00488: long lost child came home! (pid 6415)
[Fri Dec 09 00:00:25.014149 2016] [mpm_event:notice] [pid 2178:tid 139896047458176] AH00493: SIGUSR1 received. Doing graceful restart
[Fri Dec 09 00:00:27.417764 2016] [core:notice] [pid 2178] AH00060: seg fault or similar nasty error detected in the parent process


I’m running Debian (8.7) with Apache 2.4.10 on a VPS. If you need more info, just ask.

Please help!


#2

Hi @Lightbeerer,

@jsha was wondering:

If you kill -USR1 your Apache process to trigger a graceful restart, or run apachectl reload, does that also reproduce the segfault?


#3

Hi @schoen,

Thanks for looking at this.

apachectl reload doesn’t seem to work for me:

# apachectl reload
Usage: /usr/sbin/apache2 [-D name] [-d directory] [-f file]
[-C “directive”] [-c “directive”]
[-k start|restart|graceful|graceful-stop|stop]
[-v] [-V] [-h] [-l] [-L] [-t] [-T] [-S] [-X]

But kill -USR1 works, though it doesn’t seem to kill or reproduce the segfault:

# ps -ef | grep apache2
root 4897 1 0 Feb07 ? 00:00:11 /usr/sbin/apache2 -k start
www-data 13985 4897 0 07:35 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 13986 4897 0 07:35 ? 00:01:44 /usr/sbin/apache2 -k start
www-data 14014 4897 0 07:35 ? 00:01:50 /usr/sbin/apache2 -k start
root 21243 21174 0 21:09 pts/0 00:00:00 grep apache2
# kill -USR1 4897
# ps -ef | grep apache2
root 4897 1 0 Feb07 ? 00:00:11 /usr/sbin/apache2 -k start
www-data 13985 4897 0 07:35 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 13986 4897 0 07:35 ? 00:01:44 /usr/sbin/apache2 -k start
www-data 14014 4897 0 07:35 ? 00:01:50 /usr/sbin/apache2 -k start
root 21245 21174 0 21:10 pts/0 00:00:00 grep apache2

This is what the apache2 error_log says about the kill command:

[Thu Feb 09 21:10:09.979437 2017] [mpm_event:notice] [pid 4897:tid 140131030890368] AH00493: SIGUSR1 received. Doing graceful restart
[Thu Feb 09 21:10:13.070928 2017] [mpm_event:notice] [pid 4897:tid 140131030890368] AH00489: Apache/2.4.10 (Debian) mod_fcgid/2.3.9 OpenSSL/1.0.1t configured – resuming normal operations
[Thu Feb 09 21:10:13.071809 2017] [core:notice] [pid 4897:tid 140131030890368] AH00094: Command line: ‘/usr/sbin/apache2’
[Thu Feb 09 21:10:13.072307 2017] [mpm_event:warn] [pid 4897:tid 140131030890368] AH00488: long lost child came home! (pid 13984)


#4

The next thing I would try is to look in your /var/log/letsencrypt.log.* and try to find the output from a session where Certbot caused an Apache segfault. I believe it will log the changed configuration files it outputs. If you can find those, try putting in place the same configuration that triggered the segfault, then kill -USR1 <apache pid> or apachectl graceful (sorry I had the syntax wrong the first time). See if that reproduces the segfault, and from there we can eliminate changed parts of the config until we find the exact line or lines that cause the segfault.


#5

Thanks - I actually did make a copy of the relevant letsencrypt.log file just after this happened. It does seem to have logged the contents of an apache virtual hosts conf file, but these virtualhosts refer to SSLCertificateFiles and SSLCertificateKeyFiles that no longer exist, so apachectl graceful doesn’t seem to work:

# apachectl graceful
AH00526: Syntax error on line 10 of /etc/apache2/le_tls_sni_01_cert_challenge.conf:
SSLCertificateFile: file ‘/var/lib/letsencrypt/WeXDrnXoaqyFT7OwJQMiBd9xrOai1pI71mLRMhrRL6M.crt’ does not exist or is empty
Action ‘graceful’ failed.
The Apache error log may have more information.

Perhaps I can just replace the SSLCertificateFiles and SSLCertificateKeyFiles with ones that exist?


#6

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.