Error creating new order on Acme Staging

kedar031 · October 7, 2021, 7:47am

Hi Team,

We are frequently facing this issue for almost two weeks(or more), which happens roughly 1/5 times.
Has something changed?
{
"type":"urn:ietf:params:acme:error:serverInternal",
"detail":"Error creating new order",
"status":500
}

Client: acme4j
sample domain request failed: 98e4b25b2f3ba887.dim-s9m3.svbr-nqvp.int.cldr.work
Any suggestions?

Essentially the POST request for the create order is failing with 500 response , below is the trace from Acme4j

Exception from the ACME server while executing the order. Problem : Error creating new order Exception: {} org.shredzone.acme4j.exception.AcmeServerException: Error creating new order
	at org.shredzone.acme4j.connector.DefaultConnection.throwAcmeException(DefaultConnection.java:548)
	at org.shredzone.acme4j.connector.DefaultConnection.performRequest(DefaultConnection.java:479)
	at org.shredzone.acme4j.connector.DefaultConnection.sendSignedRequest(DefaultConnection.java:407)
	at org.shredzone.acme4j.connector.DefaultConnection.sendSignedRequest(DefaultConnection.java:161)
	at org.shredzone.acme4j.OrderBuilder.create(OrderBuilder.java:314)

Thank you,
Kedarnath

rg305 · October 7, 2021, 7:48am

Is there anything else in the logs?

Sorry, now I see:

I'm totally unfamiliar with it.
But do add anything else for others that might know more to see.

kedar031 · October 7, 2021, 10:12am

Thanks for the response. Updated more details, hope that helps.

rg305 · October 7, 2021, 3:12pm

Since no one else has posted...
Let's try solving this generically.
Presuming the problem started recently and you haven't made any change to warrant this error...

Which OS and version is this running?
Which version of OpenSSL is being used?
Have you updated ca-certificates?

Osiris · October 7, 2021, 3:44pm

Internal server errors are not something the user can fix nor cause as far as I know. Maybe there's something going on with the servers, although currently I don't see an active incident.

petercooperjr · October 7, 2021, 4:45pm

The only thing the spec says for "serverInternal" is that it means "The server experienced an internal error". Generally retrying should work. Are these "complicated" certificates in any way, like having lots of domain names on them that would need validation? When you say it fails roughly 1/5 times, is that with the same certificate or domain list? How big of a sample size of failures are we talking about? Does retrying the same order usually work?

kedar031 · October 7, 2021, 5:27pm

There isn't anything special with the certificate/domains, I say this because some of them have passed on retries. There are at most 2 domains in the request.
It fails for different certificates and domain lists, so this is not something specific to domain names I think.
There were around 30 such failures yesterday.
There is a sample domain I have mentioned in the description for which the issue happened, I can add more of those if that helps.

rg305 · October 7, 2021, 5:29pm

@kedar031
Is there a common timeframe when the errors occur?

petercooperjr · October 7, 2021, 5:55pm

So, you had roughly 30 failures and (extrapolating from you saying 1/5 of your requests fail) roughly 120 successful requests yesterday, all to the staging environment, all for certificates with just 1 or 2 domains? That does sound like something odd going on. While I hate to suggest any testing in production, do you make a similar level of requests to the production environment? If so, what portion of requests to production work? And you've been having roughly this level of requests per day for weeks, and notice something change a couple weeks ago? Can you narrow down more specifically when it started?

jillian · October 7, 2021, 7:33pm

Yes, we are making changes to our staging environment that we hope will bring better quality of service and stability. However, the current change needs some fine tuning and is causing a little more impact on the new-order endpoint for some use cases. In general, we've noticed the endpoint has a better success rate but it's still not where we want it to be.

This is on our radar and we are working on it!

kedar031 · October 11, 2021, 7:47am

On production, this is significantly lesser requests and thankfully have not noticed this issue there. Unfortunately, I don't have older logs to pin down from when exactly started seeing this.

kedar031 · October 11, 2021, 7:49am

Thanks, @jillian. will be great if you can update the thread once that is done and I can check back on the same.

jillian · October 11, 2021, 4:45pm

We made some changes at the end of last week that should remediate the problems you were seeing. We have seen improvements in our testing and metrics.

system · November 10, 2021, 4:46pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Acme: error 500 serverInternal :: Error creating new order Help	7	3665	December 26, 2019
Is the ACME v2 staging server working? Help	8	2330	March 12, 2018
Staging API responds with 500 internal server error Help	4	1026	August 8, 2019
Staging Env Down? Help	4	1244	July 18, 2019
Staging environment still has issues Help	14	993	April 1, 2023

Error creating new order on Acme Staging

Related topics