Staging environment still has issues

My domain is: e2e-fdr-qnx-master0.e2e-fdr.eypgdy.g0.int.cldr.work
I ran this command:
It produced this output: AcmeRateLimitedException: Service busy; retry later.

My web server is (include version): using acme4j clien

The operating system my web server runs on is (include version): CentOS

My hosting provider, if applicable, is:

Team, this is prevalent in the Staging environment even after several hours since the outage is reported as resolved. Can someone please check?

	Suppressed: org.shredzone.acme4j.exception.AcmeRateLimitedException: Service busy; retry later.
		at org.shredzone.acme4j.connector.DefaultConnection.throwAcmeException(DefaultConnection.java:545)
		at org.shredzone.acme4j.connector.DefaultConnection.performRequest(DefaultConnection.java:479)
		at org.shredzone.acme4j.connector.DefaultConnection.sendSignedRequest(DefaultConnection.java:407)
		at org.shredzone.acme4j.connector.DefaultConnection.sendSignedPostAsGetRequest(DefaultConnection.java:155)
		at org.shredzone.acme4j.AcmeJsonResource.update(AcmeJsonResource.java:117)

It seems there’s a new issue, probably unrelated to the previous one. Seems like one of the databases got OOM killed and didn’t come back healthy. We are investigating.

7 Likes

Should be better now.

6 Likes

@mcpherrinm Thanks for the quick help. we are still hitting this issue quite often(one instance - 4378ebe136cb4116.knox-71r.l2ov-m7vs.int.cldr.work).
Would it take some more time for this to be fixed completely?

we're also experiencing the rate limit issue quite often
cert-manager/challenges "msg"="re-queuing item due to optimistic locking on resource" "error"="[503 urn:ietf:params:acme:error:rateLimited: Service busy; retry later.

Sorry about the trouble. I've confirmed that performance is still affected, and have updated our status page. This may take a while for us to fix.

6 Likes

Thanks @JamesLE . Any ETA/further update on this will be of great help.

1 Like

We’ve fixed the immediate issue, and the staging environment has returned to its baseline. Unfortunately, that baseline does have a relatively high error rate. We’ll continue working to improve that, but have no ETR.

If you’re regularly unable to issue even one staging certificate, though, do let us know since the error rate should not be that high.

3 Likes

We've been having the same issue with receiving either a 503 or "Service busy; retry later" error. Seems like something to do with rate limiting?

Error: urn:ietf:params:acme:error:rateLimited :: There were too many requests of a given type :: Service busy; retry later.

1 Like

I am trying since yesterday, not requesting more than 3 certificates. But continue to get following error:

Service busy; retry later

I even tried requesting a single certificate, but stills its failing with following error:
Error:

acme: error: 0 :: POST :: https://acme-staging-v02.api.letsencrypt.org/acme/new-acct :: urn:ietf:params:acme:error:rateLimited :: Service busy; retry later.

{"type": "urn:ietf:params:acme:error:rateLimited", "detail": "Service busy; retry later."}

1 Like

I've been getting the same issues via Certbot agaisnt the staging environment. Unable to even issue one certificate over here. I've tried giving it everything from 5 minutes to 4 hours to resolve. Thank you opening the support ticket - thought I was doing something egregious.

I am facing a similar issue as well on staging. Did not try on production yet though, but I have to plan it next week.
Hope it will not be impacted.

Our staging databases struggling under load. Production is unaffected.

5 Likes


We're back to normal as of 18:10 UTC.

Sorry for the repeated problems here. It seems the ratelimiter itself had an issue, so we were still ratelimiting even once the initial issues had resolved. 503s are back to zero now.

11 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.