400 Responses from /acme/new-order endpoint in staging

We’ve been seeing a ton of 400 responses from the ACM v2 API in staging whenever we attempt to create a new order via this endpoint:

https://acme-staging-v02.api.letsencrypt.org/acme/new-order

From my understanding this isn’t an endpoint thats being deprecated, but this behavior seems indicative of some kind of brown out? The API responses we see are:

<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body>\r\n<center><h1>400 Bad Request</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n

We create lots of certs programmatically, so this isn’t specific to a particular domain (I can often retry the same request later and it works just fine). We do not see this behavior at all in production.

1 Like

HTTP 400 is unlikely to be an outage, but something wrong with the request.

If you can capture the request line + request headers + request body from such an instance, that could help.

Did this happen only in one contiguous window of time or across multiple windows in time?

If the former, perhaps LE were experimenting with the web server configuration.

2 Likes

From the LE side we’ve not made any noticeable changes to the staging load balancers or boulder WFE services in ~2 weeks.

I agree with @_az, can we see the request lines, headers, and body please?

1 Like

Yep! I’ll push some code changes out tomorrow so I can capture the outgoing requests that are failing!

2 Likes

This could be an indication that the code/client you are using has not been updated to [correctly] support Post-As-Get change deployed to stage (but not prod).

In that case OP would be seeing an ACME error response like this one.

That we see the default nginx error page strongly suggests that there is an HTTP protocol violation going on. But no idea why it would be intermittent.

We are using an old version of the golang crypto/acme client.

This is the HTTP request (obtained using httputil.DumpRequestOut):

POST /acme/new-order HTTP/1.1
Host: acme-staging-v02.api.letsencrypt.org
User-Agent: go-acme/2
Content-Length: 571
Content-Type: application/jose+json
Accept-Encoding: gzip

{"protected":"<redacted>","payload":"<redacted>","signature":"<redacted>"}

This is the payload:

{"identifiers":[{"type":"dns","value":"direwolf-26360e51fb.staging.herokuappdev.com"}]}

This is the response (again):

<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body>\r\n<center><h1>400 Bad Request</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n