"Timeout during connect" but I tcpdump shows 3 successful HTTP challenge requests w/ 200 OK

I cannot get certbot to work on live. Staging is fine (although staging doesn't even seem to even request the challenge).

My domain is: mail.summit-tech.ca

docker run --rm -it certbot/certbot --version
certbot 1.16.0

I ran this command:

certbot certonly --webroot -w /var/www/certbot \
 -email xxx -d mail.summit-tech.ca \
    --deploy-hook yyyy  \
    --rsa-key-size 4096 \
    --agree-tos \
    --force-renewal -v

It produced this output:

Performing the following challenges:
http-01 challenge for mail.summit-tech.ca
Using the webroot path /var/www/certbot for all unmatched domains.
Waiting for verification...
Challenge failed for domain mail.summit-tech.ca
http-01 challenge for mail.summit-tech.ca

Certbot failed to authenticate some domains (authenticator: webroot). The Certificate Authority reported these problems:
  Domain: mail.summit-tech.ca
  Type:   connection
  Detail: Fetching http://mail.summit-tech.ca/.well-known/acme-challenge/HL8O4pLKir3djoxr-9N8S38n3Q3ZE4PsN-_6woehlCo: Timeout during connect (likely firewall problem)

Hint: The Certificate Authority failed to download the temporary challenge files created by Certbot. Ensure that the listed domains serve their content from the provided --webroot-path/-w and that files created there can be downloaded from the internet.

My web server is (include version): nginx/1.15.12

The operating system my web server runs on is (include version): flatcar

Now, if I tcpdump -i any -s0 -A port 80 I see 3 different challenge requests come in:

First, from 18.116.86.117

GET /.well-known/acme-challenge/HL8O4pLKir3djoxr-9N8S38n3Q3ZE4PsN-_6woehlCo

HTTP/1.1 200 OK
Server: nginx/1.15.12
Date: Fri, 09 Jul 2021 19:58:36 GMT
Content-Type: application/octet-stream
Content-Length: 87
Last-Modified: Fri, 09 Jul 2021 19:58:36 GMT
Connection: close
ETag: "60e8aa6c-57"
Accept-Ranges: bytes
[...]

Then, the exact same request + and 200 OK from 18.197.97.115 and 34.221.186.243.

Then, I get a timeout and this in the log, which tells me nothing.

{
  "identifier": {
    "type": "dns",
    "value": "mail.summit-tech.ca"
  },
  "status": "invalid",
  "expires": "2021-07-16T19:58:36Z",
  "challenges": [
    {
      "type": "http-01",
      "status": "invalid",
      "error": {
        "type": "urn:ietf:params:acme:error:connection",
        "detail": "Fetching http://mail.summit-tech.ca/.well-known/acme-challenge/HL8O4pLKir3djoxr-9N8S38n3Q3ZE4PsN-_6woehlCo: Timeout during connect (likely firewall problem)",
        "status": 400
      },
      "url": "https://acme-v02.api.letsencrypt.org/acme/chall-v3/14674903853/QUoD0g",
      "token": "HL8O4pLKir3djoxr-9N8S38n3Q3ZE4PsN-_6woehlCo",
      "validationRecord": [
        {
          "url": "http://mail.summit-tech.ca/.well-known/acme-challenge/HL8O4pLKir3djoxr-9N8S38n3Q3ZE4PsN-_6woehlCo",
          "hostname": "mail.summit-tech.ca",
          "port": "80",
          "addressesResolved": [
            "64.254.226.134"
          ],
          "addressUsed": "64.254.226.134"
        }
      ],
      "validated": "2021-07-09T19:58:36Z"
    }
  ]
}

Can I ask, what are outbound IPs of letsencrypt? How many times is the challenge fetched? I'd think after 3 times, it should work. TIA for your help.

2 Likes

Hi @eric_b, and welcome to the LE community forum :slight_smile:

LE validation IPs can and will change without notice.
Is there any kind of system that could be blocking IPs?
If so, you should change it to block HTTPS instead.
Allow HTTP to handle the challenge requests and redirect all other connections to HTTPS.

Yeah HTTPS is closed already but HTTP is open. So you think it's IP blocking. I will play with my firewall rules to put this at the top allow (before the myriad of IPs blocked due to nasty behaviour).

I will probably hit my limit soon though.

Another question, does staging do the challenge request? I must assume, it uses different outbound IPs right?

4 times: three "secondary" validation points and one "primary". If there is no "secondary" mentioned in the error, it's probably the primary that's failing, which is mandatory.

If you use --dry-run I believe the staging server should always fetch new validations.. Although I'm not sure if that's just for removing previously pending authorizations and starting new ones or if that's also true for already cached valid authorizations..

In any case, as there are no actual certificates issued, the rate limit would be the "Failed Validation" limit, which has a window of just an hour.

2 Likes

Well, this time it worked.

If I run into this again, I'll enable logging on our block list and see if something hits (we don't usually for performance reasons as it generates an absurd amount of logs).

Thank you for the prompt support.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.