My domain is: e2e-fdr-qnx-master0.e2e-fdr.eypgdy.g0.int.cldr.work
I ran this command:
It produced this output: AcmeRateLimitedException: Service busy; retry later.
My web server is (include version): using acme4j clien
The operating system my web server runs on is (include version): CentOS
My hosting provider, if applicable, is:
Team, this is prevalent in the Staging environment even after several hours since the outage is reported as resolved. Can someone please check?
Suppressed: org.shredzone.acme4j.exception.AcmeRateLimitedException: Service busy; retry later.
at org.shredzone.acme4j.connector.DefaultConnection.throwAcmeException(DefaultConnection.java:545)
at org.shredzone.acme4j.connector.DefaultConnection.performRequest(DefaultConnection.java:479)
at org.shredzone.acme4j.connector.DefaultConnection.sendSignedRequest(DefaultConnection.java:407)
at org.shredzone.acme4j.connector.DefaultConnection.sendSignedPostAsGetRequest(DefaultConnection.java:155)
at org.shredzone.acme4j.AcmeJsonResource.update(AcmeJsonResource.java:117)
It seems there’s a new issue, probably unrelated to the previous one. Seems like one of the databases got OOM killed and didn’t come back healthy. We are investigating.
@mcpherrinm Thanks for the quick help. we are still hitting this issue quite often(one instance - 4378ebe136cb4116.knox-71r.l2ov-m7vs.int.cldr.work).
Would it take some more time for this to be fixed completely?
we're also experiencing the rate limit issue quite often
cert-manager/challenges "msg"="re-queuing item due to optimistic locking on resource" "error"="[503 urn:ietf:params:acme:error:rateLimited: Service busy; retry later.
We’ve fixed the immediate issue, and the staging environment has returned to its baseline. Unfortunately, that baseline does have a relatively high error rate. We’ll continue working to improve that, but have no ETR.
If you’re regularly unable to issue even one staging certificate, though, do let us know since the error rate should not be that high.
I've been getting the same issues via Certbot agaisnt the staging environment. Unable to even issue one certificate over here. I've tried giving it everything from 5 minutes to 4 hours to resolve. Thank you opening the support ticket - thought I was doing something egregious.
Sorry for the repeated problems here. It seems the ratelimiter itself had an issue, so we were still ratelimiting even once the initial issues had resolved. 503s are back to zero now.