Certbot nginx Challenge failed

Hello guys,

First context, I do have a server running Ubuntu, and the purpose its to have a lot of subdomains with SSL, since is a system used for email tracking. I already have about 970 subdomains with SSL, but yesterday certbot started giving me problems.

My domain is: sub.fakedomainfake.com

I ran this command: sudo certbot --nginx --no-redirect -d sub.fakedomainfake.com

It produced this output:

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator nginx, Installer nginx
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for sub.fakedomainfake.com
Waiting for verification...
Challenge failed for domain sub.fakedomainfake.com
http-01 challenge for sub.fakedomainfake.com
Cleaning up challenges
Some challenges have failed.

My web server is (include version):

The operating system my web server runs on is (include version):

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don't know):

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):

The version of my client is: certbot 0.40.0

But if in the command above I run --debug-challenges for any new subdomain and then press Enter it deploys without any problem. By the way I run Nginx, before each certbot for a specific sub-domain I create a file with the subdomain name and nginx configuration in nginx folder /etc/nginx/sites-enabled.

The intention is to deploy via cronjob, and using --debug-challenges wouldn't be a problem - but debug-challenges gives error if I try to give Enter in the command for interaction with the output: Skipped user interaction because Certbot doesn't appear to be running in a terminal.

So it comes to this for me, I do think is a timeout problem for the challenges, because --debug-challenges loads them before my Enter. Do you think this is a timeout of the server? Since the are now many hashes in nginx hash table and a lot of challenges to check? Seems to me that this is the problem.

Thanks in advanced

1 Like

I am assuming you get a 404 error. Is that right?

You might need to use --nginx-sleep-seconds to increase the wait time if you have a large nginx config of 970 domains (so maybe nearly 2000 server blocks?)

--nginx-sleep-seconds NGINX_SLEEP_SECONDS
Number of seconds to wait for nginx configuration changes to apply when reloading. (default: 1)

You should consider upgrading Certbot too. Your 0.40 is very old. The latest is 2.6

Update: Actually, you will need to update Certbot to try this as it was added in v1.7. From the github:

1.7.0 - 2020-08-04

Added

  • Added --nginx-sleep-seconds (default 1) for environments where nginx takes a long time to reload.
4 Likes

Hello,

Thanks for your fast reply and help. Yes indeed I do receive a 404, invalid response from url like "/.well-known/acme-challenge/fnVRXXB1jtF2JUtjbErt9T08_Sr7aebDeEor6_uJExA:404".

Tried to update certbot but system says its the newest version, I'm running Ubuntu 20.04.3 - it should update. But I will speak with the sysadmin, so they can help updating for a newer version.

When update, I'll feedback :slight_smile:

Thanks

1 Like

Check the website I linked to for install instructions for the snap version of Certbot. Yes, your Ubuntu should support that.

4 Likes

Adding to the above advice from @MikeMcQ...

IIRC you may be running into a now-resolved bug or unresolved peculiarity with large installations. I haven't tracked this general issue in a few years, but I do recall Certbot having problems with large installations for a while. The last few years of Certbot releases have really excelled at ironing out Apache and Nginx edge cases and configuration issues. This may have been one of the fixed ones, but this number of domains is really pushing it.

At the size of a system that you're running, I would not use the nginx plugin with Certbot. Instead, I would run Certbot in standalone mode on a higher port (e.g. --http-01-port=8080) and use a global macro (include file) on each domain's config to do either of the following:

  • proxy-pass on all /.well-known/acme-challenge to Certbot on the higher port
  • redirect all /.well-known/acme-challenge to a single dedicated domain for authorization, and proxypass that one to Certbot on the higher port

Using that approach, you just have to ensure traffic to /.well-known/acme-challenge makes it to the correct port (or port+domain) that Certbot is listening to. Then Certbot handles everything and doesn't touch nginx at all. Certbot installed all the certificates to reference the symlink in live - not the actual version number - so you just need to do a graceful restart to pick up the new certificates. With a deployment of your size, that could be a daily cronjob or the second task in the current cronjob that runs certbot.

Edit: You can use the webroot plugin too, but actually running everything through the same single Certbot process tends to streamline troubleshooting and minimize user error.

5 Likes

Hey guys,

Managed to remove certbot with apt-get and install with snapd version 2.6.0, --nginx-sleep-seconds is working like a charm. With API with curl is timing out but its on nginx configuration, for now I can manage this way.

Thanks for all your replies

@jvanasco I'll try that approach later on with the sysadmin :slight_smile:

Cheers

3 Likes

Can you be more specific? Maybe we can help

Great news though.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.