Trouble renewing certificate - certbot 0.40.0

Okay so I copied the next set out of the folder and back in. Try these.

BmOxbbaZLxCE7dJPY50X6e_FYR86250y0Y9ApybCMEQ
IUm_7vJG2bYkGKfOiXOjC6SnRnVdHVFJ8NpsD-dIonI

That way then it can continue and they will remain now.

4 Likes

https://acadia.k12.la.us/.well-known/acme-challenge/BmOxbbaZLxCE7dJPY50X6e_FYR86250y0Y9ApybCMEQ

BmOxbbaZLxCE7dJPY50X6e_FYR86250y0Y9ApybCMEQ.dxzKATtDb7LPjYpbLMOLltTCXYKQjPUnPrgnAvKGNbU

confirmed


https://acadia.k12.la.us/.well-known/acme-challenge/IUm_7vJG2bYkGKfOiXOjC6SnRnVdHVFJ8NpsD-dIonI

IUm_7vJG2bYkGKfOiXOjC6SnRnVdHVFJ8NpsD-dIonI.dxzKATtDb7LPjYpbLMOLltTCXYKQjPUnPrgnAvKGNbU

confirmed


Do you use Fail2Ban or some other adaptive firewall scheme?

3 Likes

We have a regular firewall in place but nothing like fail2ban on this server.

I do fear something is going on though because I can see when we check for those files in the apache logs but I do not see letsencrypt checking for them. So it is like its not reaching the apache server. That is why I was hoping there was specific IP addresses. We do have a list of banned IP addresses on the firewall but I turned that rule completely off to insure that some how these IP addresses were not in that list. We mostly put denial of service attempt IP addresses in there. If someone comes in using /.well-known/acme-challenge/ then it auto reroutes to the proper apache server that serves up these requests. So in an effort to make sure nothing was getting passed by I checked the apache logs of the servers that would answer that request if it did not have /.well-known/acme-challenge in it. There are 0 of those that have been passed to the other servers.

3 Likes

Check the recent certbot logs/output if you would. I want to make sure you're not being rate-limited for failed authorizations. No point chasing our tail. What's the last error?

3 Likes

IMPORTANT NOTES:

Not seeing anything like that. It would be really nice if they included the IP they were coming from in that response.

3 Likes

A reasonable suggestion to be sure.


Thank you for your wonderful cooperation with all this. I'm going to call for heavy reinforcements now. :grin:
They're usually pretty expedient about responding. Give them some time though.

@lestaff

We could really use your input and guidance here.

  • Manually confirmed http-01 challenge file creation and contents with 100% responsiveness over multiple acquisition attempts (i.e. load balancing not an issue)
  • Attempts from Boulder do not appear in apache logs
  • Testing against staging environment
  • Fails due to timeout after connection
  • Firewall suspected but no evidence seen
3 Likes

Thank you really. It is greatly appreciated. You have been awesome and have been extremely helpful.

3 Likes

You're quite welcome. I wish all of our visitors were as knowledgeable and responsive as you. :blush:

3 Likes

This an error stems from the connection not responding to the validation attempt. On the Let's Encrypt side, the connection was successful and waited for a reply but never got one so it stops waiting and returns the error.

You looked at your Apache logs, but this is possibly a problem at your HA Proxy layer and if possible you should review the logs there. It sounds like your HA Proxy isn't transmitting the request to Apache so you don't see any attempts in the Apache logs.

I've sent a DM with the IP we attempted to validate for your domain that you can also review.

5 Likes

I would agree with you but we can test it every other way. If we test it with a web browser it works. If we test it on remote machines using curl it works. So I guess I am confused why it does not work with the letsencrypt servers.

4 Likes

@jillian

I was able to access both challenge files myself using my Samsung phone and verify their contents. StealthMicro was also able to perform external validation. Any idea why the Let's Encrypt Boulder requests would be treated differently here?

Just a note: I did confirm that there's no IPv6 in play here.

2 Likes

@jillian

The Let's Debug response was very curious as well...

2 Likes

@griffin that warning just means that Let's Debug gave up waiting for the ACME CA to update the status of the challenge (30 seconds IIRC).

It's not anything to do with Let's Encrypt. I should probably increase the duration.

Edit: it's now 60s but doesn't seem to have helped. I'm kind of surprised. It's not like the challenge is being queued up on the Let's Encrypt side, and the 10 second deadline was always pretty strict in the past? Or I am misremembering and the 10s timeout only affects dialing?

3 Likes

It responds quickly to normal browser requests.
There must be some sort of user agent check in place.

3 Likes

That was my suspicion.

3 Likes

I also find this a bit mysterious. But I think Jillian's on the right track with checking your HAProxy logs and configs. In particular I would be curious to see if HAProxy has a log entry for the validation requests starting.

Some things we know:

  • We're getting this error from our primary datacenters, not from AWS, so this is not related to blocking AWS addresses. If it were, we'd see "during secondary validation" in the error.
  • acadia.k12.la.us consistently fails, but www.acadia.k12.la.us consistently succeeds (I checked the logs, but also your Certbot output indicates that only the non-www hostname failed). [Edit: the parenthetical referred to the wrong hostname. Fixed]
  • This is a "timeout after connect," not "timeout during connect." That means it's probably not a firewall problem. If the firewall were blocking us, we'd get a timeout during connect.

One thing I wonder about: Could your certbot or other tooling be temporarily taking down Apache while it runs, then starting it back up afterwards? For instance, it's common to configure that in standalone mode. But your output above indicates webroot mode, so that's probably not what's going on.

One long-shot test I would recommend: On some host outside your network, set up this loop:

while curl -vv -m 80 https://acadia.k12.la.us/.well-known/acme-challenge/letsdebug-test ; do sleep 1; done

That fetches the URL repeatedly, with a timeout of 80 seconds (Let's Encrypt's overall timeout is 90 seconds).

Then, on your instance that runs Certbot, do a renewal. Does the curl loop time out while the renewal is going on?

Also: is there anything different in your load balancer config for www vs non-www?

5 Likes

A lot of food for thought. :astonished: Thanks Jacob! :grinning:

2 Likes

@jsha

I got to rereading through the log and found myself wondering about something...

Did you mean this the other other way around, maybe?

As in: only the non-www hostname failed

{
  "identifier": {
    "type": "dns",
    "value": "acadia.k12.la.us"
  },
  "status": "invalid",
  "expires": "2020-12-02T16:24:09Z",
  "challenges": [
    {
      "type": "http-01",
      "status": "invalid",
      "error": {
        "type": "urn:ietf:params:acme:error:connection",
        "detail": "Fetching http://acadia.k12.la.us/.well-known/acme-challenge/kogsqUE84tlx30Z5kwlOZIEENUvf9Ru6JlnqRh_Qz8o: Timeout after connect (your server may be slow or overloaded)",
        "status": 400
      },
      "url": "https://acme-staging-v02.api.letsencrypt.org/acme/chall-v3/161888351/HzeKxg",
      "token": "kogsqUE84tlx30Z5kwlOZIEENUvf9Ru6JlnqRh_Qz8o",
      "validationRecord": [
        {
          "url": "http://acadia.k12.la.us/.well-known/acme-challenge/kogsqUE84tlx30Z5kwlOZIEENUvf9Ru6JlnqRh_Qz8o",
          "hostname": "acadia.k12.la.us",
          "port": "80",
          "addressesResolved": [
            "104.232.38.112"
          ],
          "addressUsed": "104.232.38.112"
        }
      ]
    }
  ]
}
3 Likes

@_az

I think you are probably right about the 10s timeout only affecting dialing...

2 Likes

There's even more fun stuff happening that curl isn't reproducing:

$ curl -X GET -IL http://acadia.k12.la.us/.well-known/acme-challenge/letsdebug-test
HTTP/1.1 302 Found
content-length: 0
location: https://acadia.k12.la.us/.well-known/acme-challenge/letsdebug-test
cache-control: no-cache

HTTP/2 404
date: Thu, 26 Nov 2020 06:09:09 GMT
server: Apache
content-length: 196
content-type: text/html; charset=iso-8859-1
load-balancer: ies-lb-v2-ssl

versus a minimal replica of Let's Encrypt's VA HTTP client:

package main

import (
	"crypto/tls"
	"errors"
	"log"
	"net"
	"net/http"
	"time"
)

func main() {
	doRequest("http://acadia.k12.la.us/.well-known/acme-challenge/letsdebug-test")
}

func doRequest(u string) {
	req, err := http.NewRequest(http.MethodGet, u, nil)
	if err != nil {
		panic(err)
	}
	req.Header.Set("user-agent", "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)")

	resp, err := makeThrowawayHTTPClient().Do(req)
	if err != nil {
		handleError(err)
		return
	}
	defer resp.Body.Close()

	log.Printf("Successful response: %d", resp.StatusCode)
}

func handleError(err error) {
	var netErr net.Error
	if errors.As(err, &netErr) && netErr.Timeout() {
		log.Printf("Timeout after connect (%v)", err)
		return
	}

	log.Printf("Generic error: %v", err)
}

func makeThrowawayHTTPClient() *http.Client {
	return &http.Client{
		Transport: &http.Transport{
			DisableKeepAlives:   true,
			IdleConnTimeout:     time.Second,
			TLSHandshakeTimeout: 10 * time.Second,
			MaxIdleConns:        1,
			TLSClientConfig: &tls.Config{
				InsecureSkipVerify: true,
			},
		},
	}
}
$ go run cmd/foo/main.go
2020/11/26 17:09:50 Timeout after connect (Get "https://acadia.k12.la.us/.well-known/acme-challenge/letsdebug-test": net/http: TLS handshake timeout)

So .... is the webserver intolerant of Go's TLS implementation perhaps? Setting TLSClientConfig.MaxVersion to TLS1.2 ...

$ go run cmd/foo/main.go
2020/11/26 17:13:38 Successful response: 404

Huh? Double huh, because the same exact same reproduction does not work against the www subdomain, as jsha observed earlier.

This doesn't however explain https://acme-staging-v02.api.letsencrypt.org/get/authz-v3/161448597, where port 443 is not involved at all.

All this messing around ended up with me finding and fixing a connection leak in Let's Debug though, so :partying_face:.

5 Likes