I ran this command: sudo certbot certonly --agree-tos --noninteractive --webroot -d domain -d www.domain.com -w --config-dir /sites/ssl/
My web server is (include version): nginx
The operating system my web server runs on is (include version): Centos 7
My hosting provider, if applicable, is: NA
I can login to a root shell on my machine (yes or no, or I don't know): yes
I'm using a control panel to manage my site (no, or provide the name and version of the control panel): No
The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 2.7.4
Info:
My nginx server is giving 404 for the validation file generated by certbot
(I can confirm from the debug logs that file is indeed created still nginx is giving 404.)
It's not a nginx config issue because it's happening intermittently, I am able to generate the certificate 9 out of 10 times.
I have run several test to confirm that there's no delay in nginx. I am running this setup in NFS environment.
Without a domain we can only guess but apparently your domain does not (always?) point to that nginx server. If it's intermittent then perhaps you have two IP address entries or you are load balancing.
@webprofusion Thanks for replying, yes I have multiple nginx servers but they all share the same config. Issue is not limited to one nginx server. Also the certificate usually generates in the 2nd or 3rd request after failing.
Thanks, so when you are using HTTP domain validation Let's Encrypt will immediately make their http request to your domain as http://<yourdomain>/.well-known/acme-challenge/<challenge response file>
Every single server that can possibly respond request on your domain must immediately give the same answer or you risk failing validation. Currently it seems like that's not always happening as per your 404 error.
The validation checks will come from multiple data centers all over the world ("multi-perspective" validation). Let's Encrypt added more validation perspectives a few months back, which makes it more likely for this sort of problem to occur.
The 404 error is a response coming from your domain, and that's the problem you need to fix. The obvious culprit would be your servers failing to synchronize their response quickly enough. If your cert renewal is all happening on a single server then perhaps you could direct/proxy all /.well-known/acme-challenge requests just to that one server.
If it's intermittent, it may be that certbot just isn't waiting long enough between updating the nginx config to respond to the challenge and requesting to the CA to check the challenges. I think this is more often reported with more sophisticated nginx setups. Try setting --nginx-sleep-seconds to a higher value (the default is 1) and see if that helps.
Yes, when using the --nginx plugin that can happen. But, they are using certonly --webroot so Certbot does not make any changes to the nginx config. The --nginx-sleep-seconds would have no affect (and Certbot should reject as invalid option for --webroot).