Trouble renewing certificate - certbot 0.40.0

StealthMicro · December 1, 2020, 2:22am

I did as you suggested. It is definetely a TLSv1.2 issue. I was able to tcpdump a letsdebug working and non working one and captured it as a pcap file so i can compare in wireshark. So a few more questions.

I have a pcap of a working domain and a pcap of a non working domain. Is anyone good at reading TCP at this low level that could review with wireshark?

Here is a screenshot of the passing domain.

Because I am a new member I can only post one attachment per message so I will create a second one of the failing domain.

You can clearly see it stops at the TLS handshake.

Can challenges be directed through a particular domain? The TLS portion of the transaction.

StealthMicro · December 1, 2020, 2:22am

Here is a screenshot of the failing domain

rg305 · December 1, 2020, 2:26am

I thought it was established that this remained in HTTP.

jsha:

It shows that the Let's Encrypt server successfully made a TCP connection, and that TCP connection reached your Apache server, but when we sent the HTTP request
GET /.well-known/acme-challenge/...
Host: acadia.k12.la.us
User-Agent: Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)
...
something stops that packet from reaching your Apache. And that something triggers only on validation requests to acadia.k12.la.us , not to www.acadia.k12.la.us (etc). Based on that, I think James is on the right course suggesting this might be the IPS on your firewall triggering on certain HTTP requests.

StealthMicro · December 1, 2020, 2:28am

I was under that impression as well. It does retrieve the challenge file via http but apparently part of the staging process is done with https? Unless there is a way to change that?

rg305 · December 1, 2020, 2:28am

The outbound requests are all HTTPS.

rg305 · December 1, 2020, 2:30am

A challenge request is like any other URL.
Which can be redirected to any other site.

StealthMicro · December 1, 2020, 2:31am

So with that it is looking like a haproxy issue. haproxy handles all the ssl certificates. It loads around 560+ certificates. Maybe we are hitting some weird limit with something. The thing is all the websites work and answer SSL correctly so it may be something to do with the TLSv1.2 implementation on the load balancer.

rg305 · December 1, 2020, 2:31am

Again, the challenge requests (inbound) are all HTTP.
Some work, some fail.
The only visible difference is the domains being used.

StealthMicro · December 1, 2020, 2:32am

I guess I needed to clarify. And pretty sure I can answer my own question as having the "outbound" portion redirected to a different domain kind of defeats the whole authentication part of the process.

StealthMicro · December 1, 2020, 2:33am

That is correct. So the inbound part is handled as http but the outbound part as you stated previously is apparently handled via https.

rg305 · December 1, 2020, 2:34am

You have no control over the outbound connections.
They all go to LE.

_az · December 1, 2020, 2:34am

If you want (and there's nothing sensitive in there), zip up and email the pcap files to az@letsdebug.net and I can take a look as well. Hard to understand anything from screenshots.

edit: Received, thanks.

StealthMicro · December 1, 2020, 2:37am

Done and sent. There are two files one is has failed and the other passed.

StealthMicro · December 1, 2020, 2:45am

My thinking is an update to haproxy has caused this. We are running 2.3.1 stable which has been a really nice update BTW. Much faster reloads and much better admin api. It used to take us 3.5 minutes to reload haproxy with the 560+ SSL certificates on multiple haproxy frontends. We have 3 platforms we support as well as some development servers so we load up a lot of actual certificate pems. Now haproxy reloads in less then 15 seconds. But regardless if we find this to be a haproxy issue we will revert back to what works until they can find and fix the issue. That is if it turns out to be in haproxy which at this point I am pretty sure it is. We have ruled out pretty much all else at this point.

rg305 · December 1, 2020, 2:47am

It may be a unique set of circumstances (that revolve around/involve HAProxy).
I'm about 100% certain it isn't Apache, nor the firewall.

StealthMicro · December 1, 2020, 2:49am

We have a barracuda F400 firewall. Pretty robust. I did contact support and we went through it looking for the possibility of an IPS issue as someone mentioned above and did not see where it was blocking any traffic concerning letsencrypt. So I think you are right. I will revert back to an older version later tonight if no one has any other suggestions by then.

rg305 · December 1, 2020, 2:52am

It might also be in the HAProxy config.
But, in my view, it is either the version change or a config change (or both).

You could still try redirecting everything to HTTPS - it seems that HAProxy can do that much better now.
[at the expense of doing HTTP much less better - lol]

StealthMicro · December 1, 2020, 2:54am

I am leaning more to a limit we have reached in haproxy regarding the certificates. Otherwise I think it would happen all the time. Why it works on some domains and others might actually have a pattern. It would be important to figure that out to help the folks at haproxy to fix it if it ends up being haproxy.

rg305 · December 1, 2020, 2:57am

Certificates are HTTPS, the problem is in HTTP.
Unless the system is running low on resources (first come first served, quantities limited).
Which you could try changing the loading order (I'm not familiar with HAProxy).

griffin · December 1, 2020, 2:58am

Aren't the apex (non-www) and www using the same certificate? Why would the same certificate fail for one and not the other?