Trouble renewing certificate - certbot 0.40.0

Yes if you click on the link above you can see it works from all browsers, curl and even lynx from on and off network.

3 Likes

There are now 62 posts in this topic...
Could you be more specific?

2 Likes
http://acadia.k12.la.us/.well-known/acme-challenge/BmOxbbaZLxCE7dJPY50X6e_FYR86250y0Y9ApybCMEQ
2 Likes

I'm not familiar with:

But it looks like something in your load-balancer isn't playing well with the LE request.

3 Likes

The issue is everyone else can go to that same URL fine but when the letsencrypt servers go there it times out. So it is something either to do between letsencrypt and us or something with the Go language connection and us as they use Go for the server. That is kind of where we are at. I would happily add your IP address to the firewall and you can see it is not the load balancer. Even if you go directly to apache you still see it connects fine with any web browser. So really not sure where the issue is. Waiting to hear from the letsencrypt engineers at this point.

4 Likes

Okay now I am really confused. Here is another one of our domains. This one was updated a couple of days ago via a cronjob certbot renew. The domain is: saintclementchurch.org The weird thing is it passes letsdebug.net just fine.

4 Likes

Okay so this is really strange. Here are two domains. Same IP addresses, same configuration all the way thru. 1 passes letsdebug the other fails.

Fails: https://letsdebug.net/leslie.k12.ky.us/372584?debug=y
Pass: https://letsdebug.net/maconk12.org/372586?debug=y
3 Likes

Although this doesn't sound like a simple firewall issue, it could be a complex one. After all the troubleshooting so far, I'm struggling to identify anything else that would explain this problem.

@StealthMicro, is there any chance you have a firewall or other device acting as an Intrusion Prevention System (IPS), using Deep Packet Inspection (DPI), "next gen" capabilities, etc.? Or that your provider has one running on your behalf? If so, please turn on its debug mode and look at its output during validation attempts.

6 Likes

We do have a firewall with IPS and I tried disabling it for testing this morning with the same results. I still do not understand why some work and others do not. It really makes no sense. Is there any deeper debugging I can turn on to get a better understanding on what is happening?

3 Likes

Also I am noticing with letsdebug.net it appears the issue has moved to the staging part of the process? It appears to now pass the initial part where it was failing at times at the HTTPCheck portion but now fails at the Staging portion. Is that any sort of clue?

3 Likes

Let's Debug runs tests against the Let's Encrypt staging environment, which is a completely separate test environment and not a part of the process.

3 Likes

I agree, this is very much a strange situation. That's interesting about the 408 request you saw in the logs. That tells use a lot, actually! It shows that the Let's Encrypt server successfully made a TCP connection, and that TCP connection reached your Apache server, but when we sent the HTTP request

GET /.well-known/acme-challenge/...
Host: acadia.k12.la.us
User-Agent: Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)
...

something stops that packet from reaching your Apache. And that something triggers only on validation requests to acadia.k12.la.us, not to www.acadia.k12.la.us (etc). Based on that, I think James is on the right course suggesting this might be the IPS on your firewall triggering on certain HTTP requests. It might somehow be configured such that only certain combinations of headers trigger it, or certain substrings (like acadia.k12.la.us). Rather than turning it off, can you configure your IPS to log when it blocks something, then do a failed validation? I know you tried turning it off and reproducing, but it's possible it wasn't fully turned off (e.g. maybe needed a restart?), so it's worthwhile to test in other ways too.

The other thing I would try: Can you turn on packet collection at your firewall? Do you know how to use Wireshark? If so, turn on packet collection and do a failed validation (or a failed letsdebug run), then check the packets you collected to see if the HTTP request reached your firewall (and whether it was forwarded)

3 Likes

Oh and one other question, just to check the obvious: Has any configuration of your firewalls, load balancers, or the Apache instance changed in the last ~70 days?

5 Likes

Configuration no but software updates happen all the time over all 3 of those. Even removing the load balancer out of the equation does not solve the issue. That leaves the firewall and apache. I will do a full update to the firewall tonight along with a reboot to see if that helps. Taking apache back a few versions will not be difficult either. So I will try both of those. When I restart the firewall I will leave the IPS off and try with it off with a fresh startup. If it works I will enable IPS and see what happens. I am all for getting it to work but really would like to know the problem as well. Not only to prevent it here but to help others who may have it in the future.

5 Likes

Do you have any rewrite rules in place in apache that might be interfering? This would explain the www versus non-www difference.

3 Likes

That is a very good thought but no this apache instance is very plain. No .htaccess no overrides etc. It is specific for this purpose. Thank you for all your help and the continued suggestions.

6 Likes

Something must be distinguishing the hostnames, somehow. For www, do you separate vHosts in place instead of using ServerAlias directives? I'm looking for alternate processing paths here.

3 Likes

A single vhosts that answers everything. I see where you are heading but I do not think the www vs non www is an apache issue. It is sort of like why do some domains and certificates work when others do not. If you look at letsdebug it is making it to the staging part. At least some do and others show OK altogether. It is really baffling.

4 Likes

So I uninstalled apache completely and installed the one that apt installs and same results. I even apt purged it to clear out any configuration.

3 Likes

NOTE: I'm hard headed.

I would redirect all challenge requests to a "working domain" and see if you can use that path for all those that are failing.
[If that works, then I would do the complete opposite - just to be sure where the problem lies]

3 Likes