Unable to renew - RateLimit: Service Busy

My domain is: ssl3.ipaper.io

I ran this command: Used CERTES ACME Client NuGet Package for C# to create a certificate with 99 SANs

It produced this output:
"acme.error": "{"type":"urn:ietf:params:acme:error:rateLimited","detail":"Service busy; retry later.","status":0}"

My web server is (include version): IIS 10.0

The operating system my web server runs on is (include version): Windows Server 2022 DataCenter

My hosting provider, if applicable, is: AWS

I can login to a root shell on my machine (yes or no, or I don't know): Yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel): No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): Certes 3.0.3

=============================================
We have tried over several days to renew certificates, and this seems to be randomly causing certificates to fail, is anyone else experiencing this or are there known issue s with the service currently?

Kind regards
Niels

1 Like

Hello @ITNiels ,

Likely the ACME client is scheduled to run at peak time, for example at exact hour boundary. Please reschedule it to run at random times.

On the other hand, your ACME client requires upgrade to gracefully handle the ACME server overload condition. Please see the API change announcement:

8 Likes

We have tried several times over the last 7 days, we are usually renewing several certificates one after another with a small 30s pause in between, each with 99 SANs.. seems to be sporadic, but never seen this error before now, so just wondered if something changed or you are seeing higher load than usual currently?

I am not affiliated to Letsencrypt, I do not know the load of the ACME server, sorry. However, it is possible that the overall load is increased. Retry will likely help.

5 Likes

What API call do you see the "service busy" in response to? Is it always the same API call?

Is there also the HTTP status code number available? Because some of them indicate there is a retry-after response header at which Certes should retry. I don't recall the details off-hand but more info would be helpful.

Is there any more log info available before and after that error? It looks similar to Let's Encrypt message but not entirely. Could there be another service between your client and LE issuing that error?

Also, have you tried updating to Certes 3.0.4? I couldn't find the changelog and don't want to install it to find out. Again, just trying to get more info.

Are you able to try with fewer SAN names in one cert? Does that change the symptom?

No service interruptions are posted and LE is issuing well over 4 million certs per day. So, clearly people are getting certs issued. So far we haven't seen other similar problems reported. Let's Encrypt Stats - Let's Encrypt

7 Likes

Just to be clear, I have seen a couple other posts in the last month that involved a "Service busy" message

Though I think there were generally other issues involved in those cases too, it may be that "Service busy" is happening more often than it used to. But agreed that most users are getting certs fine, and even if one attempt isn't working then the next attempt generally would. And I think the most common clients may be retrying automatically (as they should) rather than informing the user, so most people might not notice even if the message was happening more often to them.

7 Likes

Thank you everyone, we added some better handling of this error and are now 100% caught up with certificates! We are still seeing the error, but now just retrying after 15 minutes and this seems to mostly handle it OK. :slight_smile:

1 Like

If you're seeing "Service Busy" regularly, then I might be a little concerned that you're hitting the service too often or something. But it may just be that Let's Encrypt keeps getting busier. :slight_smile:

Let's Encrypt's 429 & 503 errors should have a Retry-After header with a recommended delay before trying again, if you want to get really fancy.

5 Likes

And it looks like it is load on Let's Encrypt's end; they just posted a status update that they're serving more Service Busy responses than usual.

5 Likes

We’re back to normal now (though keeping an eye on things). Our 503 rate never went above 1%, so most clients that retry on 503 should have been able to issue eventually.

5 Likes

I'm about to update status.io, but the fix took hold at 19:58 UTC, and we haven't served any 503s since then. I expect to serve some at midnight UTC, but it should be a "normal" amount, which I'll cross-check with historical values.

5 Likes

We do queue a lot of retries lately on our cPanel and DirectAdmin infrastructure setups, and they have increased in the last month. Maybe is time to scale flexibly up the infrastructure at peak times if there is a pattern ?

Ideally there should be no "peak time" due to randomisation of requests. All too often unfortunately some software will trigger on e.g. the whole hour or on 00:00 et c., which is of course bad for the ACME server infrastructure.

My personal opinion (also note that I'm not a LE staff member or something like that) would be NOT to scale up at peak times to discourage this behaviour.

1 Like

Looking at the last 24 hours, we had three periods with >0 503s/sec: at 18:10 UTC, 00:00 UTC, and 09:50 UTC. The max we returned was ~8 503s/sec.

More worryingly we served a few 500 Internal Server Errors during two of those times, which is what we're trying to avoid doing.

The compliance requirements of running a public CA inhibit cloud-style flexibility, sadly.

4 Likes

Thank you jcjones.

Indeed we also run crons at 00:00 and will definitely move this to some random not fixed :00 hours :blush:

The compliance requirements of running a public CA inhibit cloud-style flexibility, sadly.

Thats a shame indeed.

PS: Thank you for this great service, as it adds value to our services.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.