Is there a way to understand if the token that arrives at the domain validation request was really generated by a Let's Encrypt request?
In a cloud scenario, multiple servers handle the validation request for issuing a certificate (http-01), so the token/auth association must be placed in some form of shared storage between the servers (e.g., DB, cache). It would be useful to avoid requests to shared storage in the case of a formally incorrect token. For some time now I have been experiencing a lot of fake requests, but without actually knowing whether they are legitimate or not, I necessarily have to process them.
It would be useful to have some sort of expected pattern of the token or a validation of the token itself (e.g., with a CRC linked to a secret key that the server itself sends to the Let's Encrypt API while requesting the validation).
The tokens are random, so there's not a good way to tell if it genuinely came from Let's Encrypt.
There's no particular guarantee the token format stays the same either. Let's Encrypt is telling you what the token is going to be in the API, and your best course of action today is going to be to use that.
While I don't know what your infrastructure has or what the "fake requests" you're getting look like, perhaps rather than having servers contact an external DB/cache, could you push the token(s) out to all the servers when a challenge is underway? If there's too many servers, another option is to have all the servers issue an http redirect to a single static host, s3 bucket, etc which is just serving the tokens. You could even disable that when no issuance is in progress.
Could you filter out "junk" requests by user-agent string perhaps? At least avoiding DB load
Be careful not to filter too tightly in case Let's Encrypt change the strings they use. Or closely monitor cert renewals so you catch lost legit requests.
This is sort of a quick-and-dirty solution but maybe enough to avoid some of the worst of it
Thanks for the tip, I believe that the origin of the fake requests is a vulnerability found on some servers that allows malicious scripts to be inserted into the .well-known/challenge folder. The requests I see are probably used to understand if the malicious script is installed on our servers (and obviously this is not the case).
Why has Let's Encrypt never thought of a way to verify the token that clearly doesn't decrease the security of the token itself? The IP addresses are not public so they cannot be filtered, the tokens are completely random (and rightly so), but they are not verifiable. Something should be done because I believe it can be useful to everyone.
Is it possible these requests are coming from some other system of yours? Like something setup for testing or prior config that is still issuing cert requests for your domain name(s)?
I ask because we have seen similar problems here but the user-agent strings are often obviously different. A few other cases were actual requests sent to LE from a "lost" machine.
Is there any pattern to the source IP? While LE does not publish and they rotate often all but one of the LE auth server centers are in AWS. Might be hard to assess real-time but might be useful info for spot checking some log entries. Looking at legit cert requests you can see the locations of these centers (with suitable IP lookup tools).
Legit requests will come in bursts of up to 5 (today). One from the primary center and four secondary. If you are seeing "invalid" requests with the same URI in groups of 4 or 5 it could be a "lost" machine. Or someone accidentally making invalid requests for your domain.
Another idea is to switch to using a DNS Challenge.
This hasn't come up much in this forum and I don't see such rogue log entries myself. I apologize if you have already sorted through these other options. It's not easy to tell how much people have evaluated.
I understand. Was trying to get more precision around what you mean by fake
And, I need to correct what I said about the "bursts". LE made a change in recent weeks and you would not see up to 5 failed challenges. LE used to make all requests async but now the primary must succeed before trying the secondaries.
I'd be curious to see the log entries for some of those fakes. This hasn't been a problem we've seen before that I recall. We have seen similar complaints but they ended up being explained.
Of course I have more information, but I can't make it public, e.g. I cannot make the IPs public, I can however delete the last 2 octets, I have the timestamp and obviously the complete URL, but I cannot make this public either. However I am sure that the IPs are not from Let's Encrypt.
This is exactly the point, we all know that they are fake, but they can potentially be true because Let's Encrypt only says that the length must be 43 characters and that they must contain characters that can be both uppercase and lowercase and the minus, but there is no constraint on the number of uppercase or lowercase letters, so they can potentially be all uppercase or all lowercase.
Well, there's a practical element and a theoretical one. It was really Osiris observation I just thought it was a good catch.
The odds of having an entirely uppercase token value is very small. If you were to reject it that would just fail the cert request. But, so what. Aren't you retrying cert renewals frequently anyway? Occasionally original and renewal cert requests fail for any number of temp reasons (comms, LE outages, ...). This would just be one more extremely rare failure.
Your concern seemed directed at the load on your shared storage. This looks like one of those high percentage of the benefit for very low effort things.