Error creating new order on Acme Staging

Hi Team,

We are frequently facing this issue for almost two weeks(or more), which happens roughly 1/5 times.
Has something changed?
{
"type":"urn:ietf:params:acme:error:serverInternal",
"detail":"Error creating new order",
"status":500
}

Client: acme4j
sample domain request failed: 98e4b25b2f3ba887.dim-s9m3.svbr-nqvp.int.cldr.work
Any suggestions?

Essentially the POST request for the create order is failing with 500 response , below is the trace from Acme4j

Exception from the ACME server while executing the order. Problem : Error creating new order Exception: {} org.shredzone.acme4j.exception.AcmeServerException: Error creating new order
	at org.shredzone.acme4j.connector.DefaultConnection.throwAcmeException(DefaultConnection.java:548)
	at org.shredzone.acme4j.connector.DefaultConnection.performRequest(DefaultConnection.java:479)
	at org.shredzone.acme4j.connector.DefaultConnection.sendSignedRequest(DefaultConnection.java:407)
	at org.shredzone.acme4j.connector.DefaultConnection.sendSignedRequest(DefaultConnection.java:161)
	at org.shredzone.acme4j.OrderBuilder.create(OrderBuilder.java:314)

Thank you,
Kedarnath

2 Likes

Is there anything else in the logs?

Sorry, now I see:

I'm totally unfamiliar with it.
But do add anything else for others that might know more to see.

2 Likes

Thanks for the response. Updated more details, hope that helps.

2 Likes

Since no one else has posted...
Let's try solving this generically.
Presuming the problem started recently and you haven't made any change to warrant this error...

  • Which OS and version is this running?

  • Which version of OpenSSL is being used?

  • Have you updated ca-certificates?

3 Likes

Internal server errors are not something the user can fix nor cause as far as I know. Maybe there's something going on with the servers, although currently I don't see an active incident.

5 Likes

The only thing the spec says for "serverInternal" is that it means "The server experienced an internal error". Generally retrying should work. Are these "complicated" certificates in any way, like having lots of domain names on them that would need validation? When you say it fails roughly 1/5 times, is that with the same certificate or domain list? How big of a sample size of failures are we talking about? Does retrying the same order usually work?

6 Likes

There isn't anything special with the certificate/domains, I say this because some of them have passed on retries. There are at most 2 domains in the request.
It fails for different certificates and domain lists, so this is not something specific to domain names I think.
There were around 30 such failures yesterday.
There is a sample domain I have mentioned in the description for which the issue happened, I can add more of those if that helps.

1 Like

@kedar031
Is there a common timeframe when the errors occur?

2 Likes

So, you had roughly 30 failures and (extrapolating from you saying 1/5 of your requests fail) roughly 120 successful requests yesterday, all to the staging environment, all for certificates with just 1 or 2 domains? That does sound like something odd going on. While I hate to suggest any testing in production, do you make a similar level of requests to the production environment? If so, what portion of requests to production work? And you've been having roughly this level of requests per day for weeks, and notice something change a couple weeks ago? Can you narrow down more specifically when it started?

3 Likes

Yes, we are making changes to our staging environment that we hope will bring better quality of service and stability. However, the current change needs some fine tuning and is causing a little more impact on the new-order endpoint for some use cases. In general, we've noticed the endpoint has a better success rate but it's still not where we want it to be.

This is on our radar and we are working on it!

7 Likes

On production, this is significantly lesser requests and thankfully have not noticed this issue there. Unfortunately, I don't have older logs to pin down from when exactly started seeing this.

1 Like

Thanks, @jillian. will be great if you can update the thread once that is done and I can check back on the same.

2 Likes

We made some changes at the end of last week that should remediate the problems you were seeing. We have seen improvements in our testing and metrics.

5 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.