Renewal suddenly stopped two days ago

I've been sucessfully and very happy using LetsEncrypt for a long time now, but suddently two days ago I've been getting errors:


Processing /etc/letsencrypt/renewal/tjkmaintenance.ca-0001.conf


Renewing an existing certificate for tjkmaintenance.ca and www.tjkmaintenance.ca

Certbot failed to authenticate some domains (authenticator: webroot). The Certificate Authority reported these problems:
Domain: tjkmaintenance.ca
Type: connection
Detail: During secondary validation: 205.206.248.28: Fetching http://tjkmaintenance.ca/.well-known/acme-challenge/wlPgg2qsCIiqnA1yKly9XR_ygkHh8QBc6eUxHm68B7s: Timeout during connect (likely firewall problem)

Domain: www.tjkmaintenance.ca
Type: connection
Detail: During secondary validation: 205.206.248.28: Fetching http://www.tjkmaintenance.ca/.well-known/acme-challenge/nujg3dYYLNBzVQMVxbDgFv6dRdcmYPTLx0egZ6PmQ1M: Timeout during connect (likely firewall problem)

Hint: The Certificate Authority failed to download the temporary challenge files created by Certbot. Ensure that the listed domains serve their content from the provided --webroot-path/-w and that files created there can be downloaded from the internet.

Failed to renew certificate tjkmaintenance.ca-0001 with error: Some challenges have failed.


Testing the validation at the same time I see this:


Results
for crocusplains.com/.well-known/acme-challenge/nujg3dYYLNBzVQMVxbDgFv6dRdcmYPTLx0egZ6PmQ1M

URL tested crocusplains.com/.well-known/acme-challenge/nujg3dYYLNBzVQMVxbDgFv6dRdcmYPTLx0egZ6PmQ1M
Website Test performed from New York, NY on 2024-04-17 01:00:58 (GMT +00:00)

Status OK
Resolved as 205.206.248.28
Response Time 0.157 seconds
DNS 0.000 s
Connect 0.078 s
Redirect 0.000 s
First Byte 0.079 s
Last Byte 0.000 s
Size 87 bytes


Which seems to indicate that the firewall and all letsencrypt configurtions seem to be working. I'm struggling to determine where the problem is.

My domain is: tjkmaintenance.ca

I ran this command: certbot renew --agree-tos -w /data/letsencrypt

My web server is (include version): nginx version: nginx/1.20.2

The operating system my web server runs on is (include version): Centos 7

My hosting provider, if applicable, is: Self Hosted

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): certbot 1.16.0

1 Like

https://letsdebug.net/www.tjkmaintenance.ca/1880536 shows

ANotWorking
ERROR
www.tjkmaintenance.ca has an A (IPv4) record (205.206.248.28) but a request to this address over port 80 did not succeed. Your web server must have at least one working IPv4 or IPv6 address.
A timeout was experienced while communicating with www.tjkmaintenance.ca/205.206.248.28: Get "http://www.tjkmaintenance.ca/.well-known/acme-challenge/letsdebug-test": context deadline exceeded

Trace:
@0ms: Making a request to http://www.tjkmaintenance.ca/.well-known/acme-challenge/letsdebug-test (using initial IP 205.206.248.28)
@0ms: Dialing 205.206.248.28
@10000ms: Experienced error: context deadline exceeded

Best Practice - Keep Port 80 Open

4 Likes

Hi @crocusplains,

Since your last certificate renewal, Let's Encrypt added more remote perspectives for certificate validation, which are in other countries. (Currently Sweden and Singapore but this is explicitly expected to change over time and periodically add different countries.)

Your network is probably blocking some of these connections.

6 Likes

From my location connection is fine

$ nmap -Pn -p80,443 www.tjkmaintenance.ca
Starting Nmap 7.80 ( https://nmap.org ) at 2024-04-17 01:16 UTC
Nmap scan report for www.tjkmaintenance.ca (205.206.248.28)
Host is up (0.030s latency).
rDNS record for 205.206.248.28: s205-206-248-28.ab.hsia.telus.net

PORT    STATE SERVICE
80/tcp  open  http
443/tcp open  https

Nmap done: 1 IP address (1 host up) scanned in 0.37 seconds
3 Likes

I'm located in Canada. So those other countries don't come into play here. The response just above indicates that the site is accessible. AWS Health checks respond to the domain to indicate that all is well. Last firewall restart was 15 days ago with no reported issues.

The .well-known directory is empty if letsencrypt isn't running. I do currently have two files in there for testing:
http://tjkmaintenance.net/.well-known/acme-challenge/nujg3dYYLNBzVQMVxbDgFv6dRdcmYPTLx0egZ6PmQ1M

and

http://tjkmaintenance.net/.well-known/acme-challenge/wlPgg2qsCIiqnA1yKly9XR_ygkHh8QBc6eUxHm68B7s

oh! your saying tht Sweden or Signapore may be doing the renewal for me! ok, so I do country blocking, so this may be a factor. I'll pause and see if this resolves the problem.

5 Likes

They do -- the point of multi-perspectice validation is that all but one of the perspectives have to succeed. If your firewall is blocking traffic from Singapore and Stockholm, then both of those locations will fail, which exceeds the failure threshold and results in a failed validation.

See Let's Encrypt is adding two new remote perspectives for domain validation for details.

8 Likes

ok, removing location/country blocking from around the world has resolved the issue. Question can I request that the validation come from only certain countries? Or is there a whitelist that I could grab to include that in the firewall?

2 Likes

@crocusplains Nope, we've been having discussions about this in various threads since 2015, but the answer is that Let's Encrypt wants to reserve the right to change the origins of validation at any time without prior notice (and actually does plan to change them over time in the future).

I'm proposing a new documentation article for the Let's Encrypt web site about this because it's such a frequent question (particularly in light of the recent changes).

6 Likes

Thank you for the quick reply. It's working and fully tested now. I'll monitor my firewall and if it becomes an issue I'll have to watch for the changes and open countries and they come only.

6 Likes

The open access only needs to happen during the renewal process.

You can use the --pre-hook and --post-hook commands to open and close the firewall rules, so the server only allows that traffic during Certbot operations.

The caveat is those hooks wrap the entire Certbot operation, which should happen daily. With certain plugins, it is possible to only alter the firewall during active validation checks. However, using the hooks you can have the firewall up for all but a few seconds each day.

7 Likes

Even run multiple times / day. But, those hooks run only when an actual renewal attempt is to be made. So, not until 30 days before expiry per current default practice.

The amount of time the port would be open over the course of a year is very small (per cert). Not each day.

From the docs (emphasis mine):

When Certbot detects that a certificate is due for renewal, --pre-hook and --post-hook hooks run before and after each attempt to renew it. If you want your hook to run only after a successful renewal, use --deploy-hook in a command like this.

6 Likes

I need to evaluate if my firewall has those ports available. SSH access is closed even internally unless by request and then only for 15 minutes at a time for initial connection. I'll see what I can do.

3 Likes

If you can afford to run a dedicated VM, you can delegate all HTTP traffic to it.
Stick it in a tiny DMZ and watch/control all that it does/can do.
Then have it handle the HTTP /.well-known/acme-challenge/ requests as needed and simply forward all other requests to HTTPS.
That way the only requests that would ever reach a protected web server are validated/proxied HTTP ACME requests and already/previously allowed HTTPS requests.

5 Likes

The problem with that is that you need to expose the nginx in the DMZ because that is what responds to the .well-known. Now it's possible to run a proxy in DMZ, but again that exposes the proxy and that proxy has access to the internal nginx because it has to. So it's better, but still not the same as full blocks because there's attacks coming from a particular country.

Opening a window of time where the country blocks are off still seems as the most reasonable option if the firewall has those options. A proper firewall won't allow API adjustment of firewall rules, so it has to be functionality in the firewall itself. I believe I have found such an option for myself, but this is a concern for the broader community.

1 Like

oh wow! I love being corrected on this! I thought it wrapped the entire certbot invocation, not each renewal within the cerbot invocation.

Edit: For clarity, I am extremely happy that @MikeMcQ corrected me and I had interpreted that Certbot detail incorrectly. I have at least one production machine that will benefit by updating my integration to reflect the correct behavior. Thanks, Mike!

6 Likes

FWIW, if you implement firewall blocking with iptables, you can use chains to handle this. in my setup, I have an "acme" rule at the top of the configuration that opens the required ports. i just turn the rule on/off and flush to enable/disable.

5 Likes

You can use any web proxy [not just nginx].
The whole point is to "expose" a trapped system (with extremely limited access), not your actual web servers.
[if you don't know how to contain and control a system with a firewalled DMZ, then it doesn't really matter where it sits]

But only for the HTTP ACME challenges [no HTTPS allowed].
If the actual web server only serves [risky] content via HTTPS, then the proxy has no access to that risk.
The point here is that no real content should ever be accessible via HTTP - only the ACME folder path [and the redirection to HTTPS].

Attacks will come from all IP ranges.
Reducing your attack vectors by country is a false sense of real safety.
[it is a very small hurdle to overcome for any real attacker]

What is stopping you from doing both? [more is better]:

  • open and close your HTTP access window for renewals
  • proxy those allowed HTTP requests [multiple times if necessary]

To further the HTTP/HTTPS separation:
You could assign another dedicated VM just for "certificate management".
Meaning:

  • all the HTTP requests are sent/proxied to one single system [regardless of how many web servers you run]
  • that one system runs the ACME client and obtains all the certs for all the web servers [and can also redirect all other requests to HTTPS]
  • you can then simply rsync the certs to their respective locations OR have each web server come pick up their cert on a (weekly) schedule [if you are super paranoid about accessing the web servers directly from such an "exposed" system]
5 Likes

That's the point. NOT iptables. A proper firewall is running on hardware that's separated from other machines. therefore the renewal is NOT on the same machine, therefore no access to the rules.