Certbot failing to renew certificate - failed to download the challenge files from the temporary standalone webserver

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. crt.sh | example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is:
video-feeds.utccuip.com

I ran this command:
sudo certbot renew --standalone --dry-run --pre-hook "service nginx stop" --post-hook "service nginx start"

It produced this output:
Processing /etc/letsencrypt/renewal/video-feeds.utccuip.com.conf


Simulating renewal of an existing certificate for video-feeds.utccuip.com

Certbot failed to authenticate some domains (authenticator: standalone). The Certificate Authority reported these problems:
Domain: video-feeds.utccuip.com
Type: connection
Detail: 150.182.133.10: Fetching http://video-feeds.utccuip.com/.well-known/acme-challenge/v_RbkQenWkGfVhwIKd5m_oWrbpTjoxoImFyxPfcTqBk: Timeout after connect (your server may be slow or overloaded)

Hint: The Certificate Authority failed to download the challenge files from the temporary standalone webserver started by Certbot on port 80. Ensure that the listed domains point to this machine and that it can accept inbound connections from the internet.

Failed to renew certificate video-feeds.utccuip.com with error: Some challenges have failed.

My web server is (include version):
nginx/1.14.0

The operating system my web server runs on is (include version):
Ubuntu 18.04

My hosting provider, if applicable, is:
On premise

I can login to a root shell on my machine (yes or no, or I don't know):
Yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):
No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):
certbot 1.28.0

[Edits]

I am using the standalone method - therefore I stop nginx as a pre-hook and start it post hook in order to free port 80 during the renewal process.

I will be signing off soon but you could try adding

--debug-challenges -v 

to the renew standalone command you showed.

This will pause and show you the challenge URL that the Let's Encrypt server will use.

You can then try that from the public internet to see whether it works. Use a cell phone w/out wifi enabled to get outside your network if you don't have other ways to do that.

I don't see anything obviously wrong but don't have the time to assess in detail. Still, this debugging aid will be useful to know.

Hopefully you can find the problem or another volunteer can pick up.

And, it would be worth discussing why you use standalone and stop/start nginx rather than using webroot authentication.

Oh, and welcome to the community @sa-webb

8 Likes

Thanks MikeMcQ --debug-challenges -v is producing the same output as the other. Our setup has worked without issue for almost 3 years until a few days ago when our data center was restarted.

From outside of the network, I am getting refused to connect from http://dashboard.utccuip.com/.well-known/acme-challenge/c3tm-OQSiHE1EpUBzUcZJabQnHud5lIZ0KIydZXGCqE

1 Like

Oh, you switched domain names on me :slight_smile:

Right now I see port 80 and port 443 closed to dashboard.utccuip.com. Do you still have certbot paused such that nginx would be stopped?

The standalone authenticator is harder to debug because it is only active while running. The debug-challenges flag keeps it running so you can "poke" it longer to help diagnose. It isn't desiged to fix it outright.

EDIT: Oh, ports 80 and 443 should not be showing closed. If standalone was still active port 80 should be open. If nginx was running both should be open.

8 Likes

I can scan the IP and see that port 80 opens when the renewal process is running. Would you recommend us changing the webroot method and seeing if that gives us different output?

Something odd is going on. I still don't see dashboard.utccuip.com with port 80 or 443 open. Should either or both of these ports be open right now?

I saw video-feeds ports open (same IP as dashboard) and nginx responding when I first checked right after your first post. But, I have not seen those ports open since.

webroot needs at least port 80 open. It has the advantage of using the running nginx so you just reload nginx config after to pickup any new certs. Which avoids the hard stop/start using standalone. So, if we can reliably get to your nginx server that's an option but I haven't seen that work except briefly.

Do you have new firewalls since the data center was restarted?

8 Likes

If I could make one recommendation: Avoid stopping HTTPS
[especially when you are only need running certbot in HTTP authentication]

That can be done in a couple of creative ways:

  • use another web server for HTTP [like: Apache]
    set Apache to simply redirect all HTTP connections to HTTPS
    stop Apache
    run certbot in standalone mode
    restart Apache [HTTP only] and reload nginx [HTTPS only]

  • don't use any HTTP server at all [not very practical]
    stop "nothing"
    run certbot in standalone mode [HTTP only]
    reload nginx [HTTPS only]

  • use a globally defined dedicated HTTP webroot path [for all challenges]
    stop "nothing"
    run certbot in webroot mode [via existing HTTP path]
    reload nginx

7 Likes

@sa-webb I see improvement but a Palo Alto Networks brand firewall may be causing trouble. If not that brand then some firewall looks to be interfering.

I can make a test request to your server similar to what the Let's Encrypt server makes before issuing a cert. And, the test succeeds. BUT, only if I do not use the same user-agent as the Let's Encrypt server uses. If I do that the request times out.

We have seen many similar problems due to this brand of firewall as they changed a default setting recently. If you have such a firewall have your network team check for an Application Rule for "acme protocol" and be sure to allow that.

Here are sample curl requests you can provide to your network team to test the fix. These results are repeatable.

curl -I dashboard.utccuip.com/.well-known/acme-challenge/SampleChallenge123
(a 404 is normal here because file SampleChallenge123 does not exist)
HTTP/1.1 404 Not Found
Server: nginx/1.14.0 (Ubuntu)
Date: Fri, 01 Jul 2022 11:53:30 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive

curl -I -m10 dashboard.utccuip.com/.well-known/acme-challenge/SampleChallenge123 -A "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received

Note I say it "may be" Palo Alto Networks because your timeout is slightly different than previous cases. Regardless, a request with any user-agent should succeed and it is too coincidental to other failures to not be related.

8 Likes

@MikeMcQ Thank you so much for looking into this for me. I just reached out to our network team to check this and I will keep you posted.

Have a good day!

3 Likes

@MikeMcQ It worked! You are awesome.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.