Of course, it fails because this server is not prepared to accept this request over https. However, it is fully functional over http and there is no redirection to https:
curl http://sdx1-kvm.sdxlive.com/.well-known/acme-challenge/token
... returns the expected file content
So, my conclusion is that the validation process has changed on your side to check over https first, which should be corrected. I do not see any other explanation.
Let's Encrypt issues roughly 5 million certs per day. If such a change / failure were to occur it would be a significant event The LE internal alarms would be alerting staff I am sure.
The most likely explanation is that your system received the HTTP request but redirected it to HTTPS. LE will follow redirects and reports the last URL as the failing one. See: Challenge Types - Let's Encrypt
You have 3 IP addresses in your DNS. Your IPv4 address redirects HTTP requests to HTTPS. If there was a timeout reaching one of your IPv6 addresses LE will retry with IPv4. And, get redirected.
Are all of these IP able to reply properly to HTTP Challenge requests? LE prefers IPv6 so tries those first but it may use either one.
Not at the time of the challenge.
Since my first post, I have removed the challenge URL setup and data, so now it is expected that the http request is redirected to https by design.
When the let's encrypt challenge nginx server setup was in place, I can assure you that there was no redirection to https.
TCP 80 port is open on all 3 IPv4 and IPv6 addresses.
I've just discovered that for some strange reason, one of the IPv6 addresses was on "dadfailed" state (2a01:e0a:81c:99c0:558c:7a:5b5c:7b5d, not the other one), which is now corrected.
So, here's what must have happened:
LE tried 80 on 2a01:e0a:81c:99c0:558c:7a:5b5c:7b5d and failed due to an error on my side
LE did not try 80 on the 2 other IPs and went straight to https, which failed as expected.
LE should have tried 80 on the other 2 IPs before switching to 443.
You can check now the following:
nmap -p 80 -6 2a01:e0a:81c:99c0:558c:7a:5b5c:7b5d
...
PORT STATE SERVICE
80/tcp open http
nmap -p 80 -6 2a01:e0a:81c:99c0:ab21:dfd4:b245:7ef
nmap -p 80 82.66.198.137
I'm not sure the basis on which you think LE should have done this, but it doesn't. If you have multiple IPs in your DNS records, they all need to be able to respond appropriately to the challenge.
Yes, I see all 3 IP addresses now redirect HTTP Challenges to HTTPS. I don't know why you would want to do that rather than handling it right away but LE will follow the redirect.
No, LE will not arbitrarily try port 443 for an HTTP Challenge. That would be a violation of their requirements as a CA. It would only do that if redirected by your system.
And, it won't try each of your IP hoping to find one that works. It sends a single HTTP request to one of your IPv6 addresses. If it works it works. If not it fails. You can redirect that request as pointed out. And, LE will retry an IPv6 timeout with IPv4. But, only with the original HTTP request not after being redirected. (See: IPv6 Support - Let's Encrypt)
I don't think you appreciate how significant of a problem that would be if LE actually did that. It would be national news in our industry.
You are going to have to provide a better test case to reproduce that.
I have only seen you redirecting HTTP challenges to HTTPS. I think whatever glitch you saw was more likely due to the IPv6 "dadfail" you describe and LE doing its IPv4 retry.
Getting back to this point. How do you ensure LE will get a proper response regardless of which IPv6 address it tries?
This time, I have not removed the server setup on 80 so you can perform your own tests.
for instance, you might want to verify that http://sdx1-kvm.sdxlive.com/test.txt is accessible with no redirection to https.
I'm currently seeing some tests being made successfully by some readers, except for the ones trying inside /.well-known folder where the test.txt is not located.
I think he saw me doing a little poking around. I didn't see a redirection, and tested if there was a difference between the root text.txt and under the .well-known folder. I didn't see any redirections, but didn't test much. Didn't test different user agents either, but don't have time to dig in right now.
It's simple: some people reading this thread are trying successfully the URL http://sdx1-kvm.sdxlive.com/test.txt, as I see all the accesses in nginx logs, meaning they can access this resource without any redirection or other error.
The real URL has been obfuscated on purpose because it contains the real token value. I don't think you need it. The whole purpose of this post was to show that there was no redirection on my side despite numerous LE claims.
AFAIK, LE goes directly to 443 and does not even try 80, otherwise I would have seen these 80 successful or failed accesses in the logs.
Yes, I was also one of them. I mean, you did ask us to look at it
I disagree. It looks like some of the queries are being done by the ACME Client and not Let's Encrypt itself. It looks like the first was to an IPv6 address on port 80 followed by a port 80 to an IPv4 followed by an IPv6. We normally don't see that kind of series in the LE challenge error.
If the validationRecord actually reflects what has really been done on LE side and not just its intentions, it does show 2 accesses to 80 (1 in IPv6 and 1 in IPv4), but there are no record of those in my logs.
Could you give me the IPv6/IPv4 addresses of the server(s) validating the challenge?
No, I wanted to see the actual challenge URL. It has the error details as reported by Let's Encrypt. The messages shown by you look like they came from your ACME Client and not LE itself. Many ACME Clients do a "pre-check" so I'm trying to sort out exactly who sees what and when.
No, I can't. I am a volunteer. But, also Let's Encrypt checks from 5 different global locations in server farms where the IP addresses regularly change.
You should be able to see the requester IP in your log for any one specific request.
That doesn't sound like an actual LE request. What was the requester IP address? For one, LE wouldn't wait up to 8mins to try a request.
It also wasn't "successful" in that it got a 404 (Not Found).
Hiding info is just making this very difficult to debug. We are not asking for anything that needs to be secret.
Hence the quotes, but the token looked like something real. I thought you guys were doing some tests. What I meant by "successful" was that at least, what appeared to be an LE server was able to contact an 80 port on my server.
It shows that LE tried to access the port 443 at the IPv4 address, although the validationRecord shows that the 443 port has been or was supposed to be tried only on the IPv6 address.
There is a known bug that shows the "wrong" IP with a mix of:
IPv6 and IPv4
IPv6 timeout failures
IPv4 redirects
Your latest series shows this sequence as did your original. I explained this timeout / retry earlier.
I don't have time to study this and/or reproduce it. Are you sure your ansible script isn't modifying the nginx config for these challenges?
Because the failing sequence looks just like earlier even though HTTP requests right now do not redirect.
Look carefully at the "Validation Record". This is the series
#1 See URL for HTTP(port 80) and addressUsed (IPv6) #2 See URL for HTTP(port 80) and addressUsed (IPv4) (because #1 timed out) #3 See URL for HTTPS(port 443) and addressUsed (IPv6) (because #2 redirected)
LE does not do IPv4 fallback after being redirected to HTTPS so challenge fails. See: IPv6 Support - Let's Encrypt