Don't understand why reaching rate limit for '/directory'

My domain is: (for example) www.kuechen-knauseder.de, lessjsservice.ieq-systems.de, www.worminghausen-elektro.de

I ran this command: MyTask = GetHttpAnswer(_lastOrder.Authorizations[AuthzCounter].Details);
(Please see the function below)

It produced this output: (GetHttpAnswer): Error on GetHttpAnswer: Rate limit for '/directory' reached

My web server is (include version): IIS 10

The operating system my web server runs on is (include version): Windows Server 2019 / 2022

My hosting provider, if applicable, is: not relevant

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): PKISharp/ACMESharpCore (.NET Standard) v2.2.0.148

We retrieve SSL certificates for some thousand websites.
Since some weeks many / most of our certificate requests (new certificates and certificate renewals) are denied because “Rate limit for '/directory' reached”.
This error hat decreased drastically since 3 weeks.

In the meantime, there are only few certificate renewals and almost no new certificate requests successful.
We don’t understand, why we get almost always this rate limit for ‘/directory’

The error message „Rate limit for '/directory'” should only appear if there are more than 40 requests per second.
We don’t reach this limit at any time!

But due to this error we actually can’t get only few certificate renewals and almost no new certificates.

Please help us urgent!

Function “GetHttpAnswer”

private async System.Threading.Tasks.Task<string> GetHttpAnswer(ACMESharp.Protocol.Resources.Authorization actAuth)
{
  string Result = string.Empty;
 
  try
  {
    PkiJwsTool AcmeSigner = new PkiJwsTool(256);
    AcmeSigner.Import(_account.JwsSigner);
 
    using (AcmeProtocolClient AcmeClient = new AcmeProtocolClient(AcmeUri, signer: AcmeSigner, acct: _account.Details, usePostAsGet: true))
    {
      ServiceDirectory AcmeDir = await AcmeClient.GetDirectoryAsync();
      AcmeClient.Directory = AcmeDir;
 
      await AcmeClient.GetNonceAsync();
 
      Challenge HttpChallenge = actAuth.Challenges.First(x => x.Type == ACMESharp.Authorizations.Http01ChallengeValidationDetails.Http01ChallengeType);
 
      Challenge UpdatedHttpChallenge = await AcmeClient.AnswerChallengeAsync(HttpChallenge.Url);
 
      OrderDetails AcmeOrderDetails = await AcmeClient.GetOrderDetailsAsync(_lastOrder.Details.OrderUrl, _lastOrder.Details);
 
      _lastOrder.Details = AcmeOrderDetails;
      Repo.Saveorder(_lastOrder);
 
      await RefreshOrderAuthorizations(AcmeClient);
      Repo.Saveorder(_lastOrder);
 
      await DecodeOrderAuthorizationChallenges(AcmeSigner);
      Repo.Saveorder(_lastOrder);
    };
 
    Result = "true";
  }
  catch (Exception ex)
  {
    Result = "Fehler bei GetHttpAnswer: " + ex.Message + " " + ex.InnerException;
  }
 
  return Result;
}

Surely there was more to the error message than that. Can you share the complete and exact error you're seeing?

6 Likes

Interesting, that's quite similar to this reported problem: RateLimited error on a get request to /directory · Issue #2644 · win-acme/win-acme · GitHub

My suggestion there was to avoid running renewals on the hour (in case many other ACME clients are also doing the same) or at the same time as other process that may be running on your other servers, in case you are causing a stampede from the same IP address.

4 Likes

Actually, @mcpherrinm or @JamesLE this might interest you or someone else at LE. I think we are seeing slightly increased occurrences of the ACME Directory not being available in the dashboard reporting we do for Certify The Web, since maybe the 27th. We don't capture the stack trace but it would make sense if it was hitting a rate limit.

4 Likes

Thanks for the suggestion, we have tried it to several different times which were not on the hour, but with no luck.

2 Likes

It seems most likely you're hitting the "server busy" (503) New "Service Busy" responses beginning during high load

We don't serve many of these, though. We have a handful at 00:00 and 16:10 (just now) which are our two busiest times, and single digits throughout the rest of the day.

If you'd like us to have any more investigation, then including an HTTP error code, the ACME error, and whether the client retries 503s automatically would be helpful information.

7 Likes

Yes, the cmplete and exact error is the following

(Exception: ACMESharp.Protocol.AcmeProtocolException: Rate limit for '/directory' reached
bei ACMESharp.Protocol.AcmeProtocolClient.d__58.MoveNext()
--- Ende der StapelĂŒberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
bei ACMESharp.Protocol.AcmeProtocolClient.d__59`1.MoveNext()
--- Ende der StapelĂŒberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
bei ACMESharp.Protocol.AcmeProtocolClient.d__38.MoveNext()
--- Ende der StapelĂŒberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Ieq.Certfunctions.CertFunctions.d__29.MoveNext())

We are surprised, that yesterday evening (6-10 PM CET) about 250 certificates were renewed successfully in 8 attempted batch runs.
This night (3:55 - 4:01 AM, 15 errors / 5:25 - 05:56 AM, 53 errors, 7 successful) again most renewals are producing the /directory rate limit error.

We don’t see any HTTP error code, so we don’t interact / retry with HTTP error codes.

The complete ACME error:

(Exception: ACMESharp.Protocol.AcmeProtocolException: Rate limit for '/directory' reached
bei ACMESharp.Protocol.AcmeProtocolClient.d__58.MoveNext()
--- Ende der StapelĂŒberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
bei ACMESharp.Protocol.AcmeProtocolClient.d__59`1.MoveNext()
--- Ende der StapelĂŒberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
bei ACMESharp.Protocol.AcmeProtocolClient.d__38.MoveNext()
--- Ende der StapelĂŒberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
bei Ieq.Certfunctions.CertFunctions.d__29.MoveNext())

If you just repeatedly access the https://acme-v02.api.letsencrypt.org/directory endpoint from a browser behind the same IP address (e.g. server desktop etc) do you easily get an error or does it always seem to return the directory info?

3 Likes

There is no problem to access the endpoint and refresh the site many times!

@Admins_ieQ thanks, are you able to run your renewal process with any kind of diagnostic logging (of the HttpClient Requests and Responses), I know you can supply AcmeProtocolClient your own HttpClient so you could hook a debug log into that. Alternatively if you can perhaps run Fiddler and capture https traffic during renewal.

The rate limit on /directory is 40 calls per second, which could easily happen if your renewal process runs parallel renewals rather than sequential renewals. Could you try artificially throttling the pace of your ACME calls or limiting parallel execution (e.g. only allow a few tasks to run at a time)?

3 Likes

@webprofusion
We didn't find any logging option in ACMESharpCore, so we used Fiddler Everywhere.
Attached you find the desired capture screenshots.

As you can see, we don't get any 503 HTTP error, but a 429 Error "Too many requests".
Also you can see, that there are some seconds between the /directory requests.
In our program these intervals (some seconds) are always between the /directory requests.

We don't use parallel renewals, they run sequentially.

1 Like

I'm not certain why you see the 429 other than there is some limiting.

Can you show the contents of the Headers section? You should see a Retry-after header indicating when you should retry the 429 request.

4 Likes

@Admins_ieQ that looks good, thanks. The follow up fetch of the directory should wait at least a few seconds before trying again, this is retrying in the same second. You would ideally modify the library to honour Retry After headers as @MikeMcQ suggests.

@beautifulentropy I don't suppose you can see anything in recent rate limit code changes for boulder? Or can this also be something higher up at the cloudflare traffic level?

5 Likes

@MikeMcQ @webprofusion
I don't see any Retry-after header as you can see in the screenshot.
Where should I find it?

How long should we wait before retrying?

2 Likes

@Admins_ieQ it certainly looks like there is no Retry After present. I'd suggest waiting at least 1 second between API calls (you can achieve this using an HttpClient with a custom handler set, like: c# - How to Throttle all outgoing asynchronous calls to HttpClient across multiple threads in .net Core API project - Stack Overflow or using something like Polly ( in .net) to throttle and retry.

In Certify The Web we also default to 3 seconds between retrys and use the Retry-After value if it's present and within what we consider to be a usable range (i.e. it's less than a minute). Coping with CA APIs having off-days is all part of developing ACME clients.

6 Likes

The /directory ratelimit is an nginx one and hasn’t been touched in ages

6 Likes

Cool, so it doesn't do Retry-After headers?

5 Likes

@webprofusion
We have added a wait interval of 3 seconds in our renewal tool.
Now many certificates are renewed, but some (about 10%) get still the "rate limit for ‘/directory’".

We have added a regular 3 seconds delay before every call of "GetDirectoryAsync".
This doesn't delay only after failure, it delays before every call.

We haven't added this 3 seconds delay before for example "CreateOrderAsync", "GetOrderDetailsAsync".

Do we need to add the delay also before the calling of other functions?

@mcpherrinm
The rate limit doesn't seem to rely on our tool / interval.
Perhaps the Let's Encrypt staff can investigate this deeper?

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.