New order - 429 error behaviour change

Hi,

Is something was changed when posting a new order for a domain that exceeded the weekly limitation for certificates per week?

Until now when we order more than 5 certificates for the same domain we got 429 error (which is fine and we knew how to deal with it) but now, we start getting 400 status code and “bad reply nonce” error.
Any idea why? is something was changed?

Hi @cojalvo,

I can't think of a change that would have the affect you're describing. Can you share some timestamped logs of your requests & the responses?

Do you know why your system is issuing 5+ duplicate certificates within a week period? That seems like a separate problem that might be worth addressing.

Hi, Thanks for the help.
Our system doesn’t issuing 5 certificates… its a test that we are doing.
Im from certificate manager IBM cloud service and we are using LE to issue certificates to our customers.
One of the customers try to issue 5 certificates and the error he gets doesnt reflect the real failure reason (of more than 5 certs per week) so we investigate it and discover thats was the reason, but from lets encrypt we replied with 400 error and not 429 as it should be.

Toady i made some tests with domain “test.e2e.certificate-manager.test.cloud.ibm.com” to confirm the error and it reproduced.
I reproduce it right now again so you can check it.

Thanks

2 Likes

That makes sense, thanks for the extra context :+1:

Great, that helps a lot. I was able to look at the logs and what I see is the correct 429 response being returned to your ACME client as a result of the POST to newOrder. Here's a breakdown:

  • 10:15:34 UTC - the 5th finalize request arrives and succeeds.

  • 10:16:36.672 UTC - a newOrder request arrives for the same set of names. We return the error:

    "Error":"429 :: rateLimited :: Error creating new order :: too many certificates already issued for exact set of domains: test.e2e.certificate-manager.test.cloud.ibm.com: see https://letsencrypt.org/docs/rate-limits/"
    
  • 10:16:36.898 UTC - another newOrder request arrives for the same set of names. It must have reused a nonce, or otherwise included the wrong nonce because the error returned is:

    "Error":"400 :: badNonce :: JWS has an invalid anti-replay nonce: \"0001-3PNPNZKIG4Epljz-eJRnTbgtyS2bI7M7gKZpRKQeBU\""
    
  • 10:18:05 UTC - another newOrder request arrives for the same set of names. The nonce is correct this time and a 429 is returned:

    "Error":"429 :: rateLimited :: Error creating new order :: too many certificates already issued for exact set of domains: test.e2e.certificate-manager.test.cloud.ibm.com: see https://letsencrypt.org/docs/rate-limits/"
    

The pattern continues after that. After the first 429 response every 2nd newOrder request has a bad nonce and gets a bad nonce error.

Is it possible that the 429 response invokes a retry that uses the stale nonce? I would recommend taking a look at this area of your code. Adding logging of received/used nonces would also probably help.

Returning to the original question in your thread it looks to me like the 429 you expect is being returned (when the nonce is correct) and when the nonce is incorrect you get that error instead. Processing of requests nonces happens earlier than processing of rate limits.

Hope that helps!

1 Like

Ok, now its make sense. we do have retry mechanism… i will need to disable it since it doesn’t use the new nonce…

Thanks for the help!

2 Likes

If you wanted an easy way to try and test these conditions our Pebble test ACME server can be configured to reject good nonces as if they were bad based on chance. That might be helpful for you when you're looking at improving the retry mechanism.

Happy to help! Thanks for the question :slight_smile:

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.