[SOLVED] Unable to pass HTTP-01 verification "Error getting validation data"

My domain is: admissions.st-francis.herts.sch.uk

I have a Let's Encrypt integration running using the PHP acmephp/core package. It has been working successfully issuing SSL certificates for hundreds of our clients domains. For some reason, for this one particular domain, we are getting an "Error getting validation data" response from the Let's Encrypt API.

Here is the response from the API:

{
  "type": "http-01",
  "status": "invalid",
  "error": {
    "type": "urn:ietf:params:acme:error:connection",
    "detail": "13.55.119.192: Fetching http:\/\/admissions.st-francis.herts.sch.uk\/.well-known\/acme-challenge\/zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk: Error getting validation data",
    "status": 400
  },
  "url": "https:\/\/acme-v02.api.letsencrypt.org\/acme\/chall-v3\/255141073566\/ePXLJw",
  "token": "zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk",
  "validationRecord": [
    {
      "url": "http:\/\/admissions.st-francis.herts.sch.uk\/.well-known\/acme-challenge\/zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk",
      "hostname": "admissions.st-francis.herts.sch.uk",
      "port": "80",
      "addressesResolved": [
        "13.55.119.192",
        "3.105.195.203",
        "54.79.12.172"
      ],
      "addressUsed": "13.55.119.192"
    }
  ],
  "validated": "2023-08-15T08:39:50Z"
}

I have checked the access logs on the servers and I can see 200 responses for the Let's Encrypt request e.g.

23.178.112.104 - - [15/Aug/2023:18:39:51 +1000] "GET /.well-known/acme-challenge/zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk HTTP/1.1" 200 119 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

I checked the DNS for the domain and it's set up as we normally request - CNAME to app.digistorm.com.

I checked for CAA records on the domain.

As a comparison, we recently issued a cert for admissions.warlinghamparkschool.com without any issue through the exact same system.

Please help me figure out what the issue is with this domain, I can't think of anything else!

My web server is (include version): NGINX 1.12.2

The operating system my web server runs on is (include version): Amazon Linux 2

My hosting provider, if applicable, is: AWS

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): acmephp/core 1.3.0

Hi, your domain is pointing at 3 different IP addresses in DNS:

app.digistorm.com.      0       IN      A       54.79.12.172
app.digistorm.com.      0       IN      A       13.55.119.192
app.digistorm.com.      0       IN      A       3.105.195.203

For http validation to work all of the servers responding for your domain must be able to answer the http challenge correctly and it seems like 13.55.119.192 didn't.

3 Likes

Hi @webprofusion,

That's right, it's an AWS load balancer. Perhaps something isn't configured correctly on one of our web servers. I'll see if I can spot anything.

Great, so I imagine you have something clever that syncs the /.well-known/acme-challenge part of the filesystem across all the responding servers (or otherwise integrates with nginx on each server), in which case you need to make sure that's happening fast enough (and is completed before telling the CA to check the response) so that all servers can respond to the same challenge.

2 Likes

How many do you see? There should be at least three requests, from different IPs, for each validation attempt.

4 Likes

@webprofusion yes we have an integration set up that fetches the challenge payload, syncs it out to all web servers, then makes the call to get a certificate (triggering the challenge). It does not attempt to get a certificate until the payload is synced across all instances.

@petercooperjr there are 4 servers behind the load balancer, unfortunately I can see a 200 response to the challenge request on all of them:

Server 1:

101.118.226.236 - - [15/Aug/2023:19:38:30 +1000] "GET /.well-known/acme-challenge/zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk HTTP/1.1" 200 88 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Server 2:

35.166.192.222 - - [15/Aug/2023:18:39:51 +1000] "GET /.well-known/acme-challenge/zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk HTTP/1.1" 200 119 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Server 3:

18.218.235.143 - - [15/Aug/2023:18:39:51 +1000] "GET /.well-known/acme-challenge/zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk HTTP/1.1" 200 119 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

Server 4:

101.118.226.236 - - [15/Aug/2023:19:40:28 +1000] "GET /.well-known/acme-challenge/zdbKm5il06iz5k5QmRPZySYwa3meY1sWGGmuREpi0Gk HTTP/1.1" 200 88 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

The timestamps are a bit out as I have retried a few times.

I'm thinking about what you said about requests from different IPs @petercooperjr and have had a read through ACME v1/v2: Validating challenges from multiple network vantage points. Perhaps we are blocking one of the Let's Encrypt IP addresses somewhere. Our servers have been hit with a lot of bot traffic, so we have blocked a lot of IPs recently.

I will try disabling the blocklist temporarily and see if I can get the certificate issued.

2 Likes

OK, that was it. I disabled the blockilst and the certificate was issued successfully.

I will have to update our NGINX config to not apply IP blocking on requests to .well-known/acme-challenge/*. I don't think there's any reliable way of whitelisting Let's Encrypt's known IP addresses by the looks of things.

Thank you for your help @petercooperjr & @webprofusion; :slightly_smiling_face:

4 Likes

You should probably set your IP blocks with an expiration date. And maybe don't block an entire /16. It's not like they're going to DoS you for several days at a time. :smiley:

You renewals only have to succeed once (per fqdn!) in the 30-day window, so if an IP is blocked you just try again later, if you're worried about this you can change the default 12 hours to 8, 6, or 4.

3 Likes

Yes, if you need to be blocking public Internet traffic, then having an exception for .well-known/acme-challenge is a standard way to make sure that validation attempts will still work.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.