Certbot nginx plugin 404 solved

I'm writing this because of an issue that I faced, and there were no good answers to it on any of the forums, however I managed to solve it and wanted to share it with you.
My setup is as such:
OS : Debian 9 , Kernel 4.9.0-14-amd64 on GCP
Web server: nginx/1.18.0

What this server is doing : Hosting many virtual hosts, some of them are static sites, others are php based sites and some are Django sites.

Problem : 3 months ago, the certbot autorenew stopped working. I was also not able to obtain new certificates for new sites. The error was the usual 'nginx 404' that a lot of people were getting.
Failed authorization procedure. example.domain (http-01): urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization :: Invalid response from https://example.domain/.well-known/acme-challenge/BgxplVnHSmEVj3WLwnAsr3CNaTG6s_RHEUKXwdvSTOE [xxx.xxx.xxx.xxx]: 404

I tried just about everything under the sun and couldn't get it to work. Finally to debug the issue, I ran the command # certbot certonly --test-cert --dry-run -d example.domain and at the same time ran a tail -f on the nginx site, the access log as well as the error log to check out what was going on. The site file was being modified correctly, nginx was being reloaded correctly, however the access logs were still throwing 404's however when I visited the page with my browser a second later, it loaded fine. That meant nginx was probably taking a tad bit longer to restart. I have some heavy modsecurity custom rulesets running on the server which was slowing down the reload of nginx.

I figured if I could somehow delay the request from Let's Encrypt's servers, then it would work. So I cd'ed into the /usr/lib/python3/dist-packages/certbot_nginx directory and ran grep -ri sleep. I found the culprit in a file called configurator.py. In the function def nginx_restart(nginx_ctl, nginx_conf):, there's a time.sleep(1) - it's towards the end of the file. I just changed that to time.sleep(4) and everything worked again.

This had me pulling my hair out for months and I thought sharing the solution might help someone out there. Hope it helps!!

4 Likes

Hello @hazardousmonk,

Thanks for sharing the solution.

I'm making a gently call to @certbot-devs so they can take a look to this issue and can provide another solution or implement something to avoid this issue in a future certbot version.

Cheers,
sahsanu

1 Like

Welcome to the Let's Encrypt Community :slightly_smiling_face:

Perhaps use reload instead of restart?

2 Likes

We added --nginx-sleep-seconds a while back for situations like these. Unfortunately detecting when the asynchronous reload process has completed portably is a lot trickier than it looks.

4 Likes

@_az, thank you very much :wink:

@hazardousmonk as @_az said, there is an option to modify the sleep so you should use it instead of modifying the source code.

--nginx-sleep-seconds NGINX_SLEEP_SECONDS
                        Number of seconds to wait for nginx configuration
                        changes to apply when reloading. (default: 1)
1 Like

Great! - this really helps. Thanks @_az and @sahsanu . I hope this setting can be set permanently (for when the certs need to autorenew). Modifying the source code was a last resort - I wouldn't normally do this.

2 Likes

If you issue or renew a certificate with that flag it will be persistent, yes. You can also throw this into /etc/letsencrypt/cli.ini to make it affect all your certificates (assuming Certbot 1.7.0 or newer):

nginx-sleep-seconds = 5
3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.