Certbot renewals have suddenly started failing

fmouse · August 30, 2021, 5:35pm

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is: davidamram.com

I ran this command: certbot --apache -d davidamram.com -d www.davidamram.com

It produced this output:
Failed authorization procedure. davidamram.com (http-01): urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization :: During secondary validation: Invalid response from https://davidamram.com/.well-known/acme-challenge/897lhnAmCnHQlXOVzNNLGod34vIuTHNqDE9G3XO-63g [198.58.125.221]: "\n\n404 Not Found\n\n

Not Found

\n<p"

My web server is (include version): Apache/2.4.18 (Ubuntu)

The operating system my web server runs on is (include version): Linux linode 5.13.4-x86_64-linode146

My hosting provider, if applicable, is: Linode

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): 0.29.1

The cert auto-renewals have been working flawlessly for well over a year on my server, but have suddenly started failing with the above error. I have many customers whose domain name certs are coming up for renewal! Any help will be appreciated!

rg305 · August 30, 2021, 5:44pm

Let's see if we can get to the bottom of this quickly, starting with the output of:
sudo apachectl -t -D DUMP_VHOSTS

Yes, I do see:

[which implies several things - primarily HTTP to HTTPS redirection (which might not be prudent)]

fmouse · August 30, 2021, 9:47pm

This has been working for a couple of years, and now it's failing. Why am I getting a permissions error? and why a 404 error on a folder that doesn't exist? Is certbot supposed to create this folder and file? It runs with root privs I think, so it should be able to.

I have MANY failures, and customers calling me.

Whether or not the redirection to https is prudent, it's not the source of this error.

fmouse · August 30, 2021, 10:01pm

If I run certbot (as root) with certonly the first error I'm getting is:

davidamram.com (http-01): urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization

Apparently this permissions error is preventing the writing of the dir and file .well-known/acme-challenge/blahblahblah in the DocumentRoot.

This happens with many domains.

Osiris · August 30, 2021, 10:04pm

Which permission error? The error you've quoted is from the remote Let's Encrypt validation server and not an error from certbot itself.

fmouse · August 30, 2021, 10:20pm

"urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization"

The "client" is certbot running on my server and for some reason it's "unauthorized". This is what I'm calling a "permissions error". This is generally a generally a kernel-level file or directory access error, however certbot is running as the root user.

fmouse · August 30, 2021, 10:25pm

The validation server is simply reporting on its dialog with certbot running on MY server (the client). The key is apparently the "unauthorized" token. Why is my certbot "unauthorized"?

Osiris · August 30, 2021, 10:35pm

Because when the Let's Encrypt validation server tried to download the temporary token file, it didn't receive the token file, but got a 404 file not found error.

When using the --apache plugin, this usually is because the plugin has a difficult time integrating properly with your Apache setup, which might be due to a number of reasons.

That's why @rg305 asked a certain apachectl command earlier.

rg305 · August 30, 2021, 10:48pm

And how exactly can you be so certain?

You are the one stuck in the middle of the problem.
Perhaps I really do have a better view from way over here.

If you read between the lines...
HTTP to HTTPS redirection implies that HTTP was reachable.
If so, then you could have handled the challenge requests then and there.
Redirecting them to HTTPS only defers the process to now also require a second access path (HTTPS).

rg305 · August 30, 2021, 10:56pm

I'll ask a second time:

fmouse · August 30, 2021, 11:04pm

OK, I may have found the problem. On another thread rg305 notes that certbot starts its own copy of the apache webserver, and to do so it shuts down all running instance of apache. If for some reason it can't do this, it can't do its work and will fail.

So this is apparently what happened.

Running 'apachectl stop' did not shut down all apache instances, for some reason. I had to manually killall the child servers, and because the webserver is kept alive by monit, I had to shut down monit and then kill the apache child daemons, and then restart monit.

This points in another direction! I've had a lot of DoS attacks on my web server of late from China, Russia and the like. I have a fail2ban config which counts apache daemons every minute and if the number exceeds a limit, it parses the server_status (not its real name on my server) to find the offending IP address (which it reports to me) and adds a an IP block to the kernel filter tables. I've noticed an increasing number of such attacks of late, and apparently there is a vulnerability in apache which allows a querying host to lock up a child apache child process so that it can't be killed with 'apachectl stop'. Either that, or residual files from a recent server reboot somehow polluted apache in such a way as to lock some of these processes. I can't imagine how this might have happened.

Anyway, running 'apachectl stop' now kills all running children, and certbot works as expected.

Thank you very much, rg305 and Osiris

rg305 · August 30, 2021, 11:10pm

Although I am happy to hear of your success, I am puzzled as to why you would need to shutdown a working web server (and then start a temporary one) only to get a proper challenge response from it.

I think we can find you a better solution than that

Osiris · August 30, 2021, 11:11pm

Not exactly. The --apache plugin modifies the Apache configuration temporarily which requires a (graceful) reload of the Apache process..

That said, the issue you've described might very well also have hampered the reloading necessary for the plugin to work.

rg305 · August 30, 2021, 11:12pm

Ok. I see now what is happening

Osiris · August 30, 2021, 11:15pm

Cool, it's secure

rg305 · August 30, 2021, 11:18pm

All that said, I still...

One not so intrusive.

fmouse · August 30, 2021, 11:20pm

This is not my requirement, it's a certbot requirement, and it normally takes care of this automatically when it runs. I found out about it from one of your posts on another topic There were child processes stuck which wouldn't shut down by manually running apachectl stop, certbot couldn't shut them down either, which it apparently must do. I had to manually kill them to restore the server to its normal state.

I don't know exactly why certbot needs to do this, but apparently the web server it deploys has properties which integrate with the verification process. Understanding the details of this is above my pay grade! Anyway, I don't routinely have to shut down apache to run certbot. It does it on its own.

Osiris · August 30, 2021, 11:20pm

If I read the post correctly, it was a temporary issue and the plugin works properly now.

As I said earlier, certbot does not implement its own webserver when using --apache, but utilises Apache itself.

The issue you had now might happen again in the future though.

Although that issue is not just related to certbot: it should not be possibe to have Apache child processes which are not stoppable through regular means (e.g., an Apache reload)...

fmouse · August 30, 2021, 11:26pm

At this point, everything is working as advertised. The only question is why there were stuck apache children which wouldn't shut down. It looks and smells like a DoS attack.

rg305 · August 30, 2021, 11:28pm

There is no telling what is making apache unwilling or unable to restart gracefully.
I'm near certain this will happen again.
So, I'm simply proposing a method that won't require any such restart/reload.
Like: --webroot
OR
Perhaps a dedicated challenge path that would make certbot even need to do anything to apache.
But since I haven't seen any of the configs , I can't comment on what may be possible down that path.

Topic		Replies	Views
Failing to renew via either apache or webroot Help	10	2678	September 1, 2017
Certbot renew does not work anymore Help	5	1310	March 3, 2019
Renewing Certificate fails. Timeout after connect (your server may be slow or overloaded) Help	23	6758	October 16, 2018
Produced an unexpected error: Failed authorization procedure Help	69	13491	January 21, 2019
LetsEncrypt suddenly failing to create temporary web root for verification? Server	21	15137	May 23, 2018

Certbot renewals have suddenly started failing

Not Found

Related topics