HTTP 01 challenges

Hi there,

We noticed that Let's Encrypt keeps retrying old acme challenges on our customers domains that don't exist anymore in our database. Currently we response with a 400 error but we get a similar request around every 10 seconds.

What's the recommended way to stop the retry mechanism for cert-manager? Is there another HTTP status code we should try? The cert-manager version is v1.3.1.

Thanks.

2 Likes

Welcome @SokratisVidros

Let's Encrypt would only make challenge requests when an ACME client requests certs. Are you sure it's the LE servers making the request? Can you show some example log records at least the requesting IP, URI, and User Agent? Does the DNS for those old names still point to your servers?

Is it the same challenge URI each time?

3 Likes

Sure.

The requesting IP is 34.82.13.157 and the user-agent is cert-manager/v1.3.1 (clean). One of the hosts is hello-auth.z2h.lcl.dev but all "*.lcl.dev" resolve to our servers.

LetsEncrypt wouldn't retry old acme-challenges - an authorization challenge can't be retried once it fails. Are you sure these aren't new challenges for old customers?

If so, the most likely causes of this situation is that your (former) clients haven't updated their DNS to their new hosts, or the registration lapsed but nameservers are still active. LetsEncrypt would not be able to hit the endpoints in your system unless that happened.

Those requests could be from these customers trying to set up their systems elsewhere, however... it's also possible that you did not unenroll the domains in your cert-manager system correctly OR you found a bug in cert-manager. I wouldn't be surprised if you did.

Cert-manager has a history of issues due to poor application design, coding and prioritization of issues by their developers. Earlier versions have been banned by LetsEncrypt for excessive traffic. (This exhaustive ticket on their github supports my comments:cert-manager v0.8.0 and v0.8.1 send excessive traffic · Issue #1948 · cert-manager/cert-manager · GitHub).

If these are actually old challenges, are you sure they're not coming from within your network(s)? Perhaps there is something in your system that checks for external visibility of challenges, and that system is responsible for all this.

3 Likes

That IP is allocated to google, and marked as part of google cloud. I haven't seen ISRG/LetsEncrypt run anything on their network yet, has anyone else?

https://whois.arin.net/rest/net/NET-34-64-0-0-1/pft?s=34.82.13.157

3 Likes

Probably best to ask about this on the github for cert-manager. Does not look like Let's Encrypt is involved.

3 Likes

I don't know anything about cert-manager, but I do know there are multiple ACME clients out there in which part of the sequence of getting a certificate is checking if the challenge token is accessible by the client first before triggering the validation at the ACME server.

OP should doublecheck if there aren't any "rogue" ACME clients out there trying to perform challenges unnecessarily.

4 Likes

My first guess (if the complete requested URL doesn't change) would be that something is scanning previously posted links.

And if the URL does change, there is something misconfigured that (incorrectly) thinks it should be able to obtain a cert for that name (and tries... and tries...).
If that is the case, it is really not much to worry about, as it will never be able to do so.
But I do understand your need to find and fix that problem - it just might not be within anything you control, or it might.
Being the EHLO/HELO is from your domain, it is likely something you (or someone in your company) can control.

2 Likes

Thanks everyone. The fact that now we know that LE is probably not involved is helpful.

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.