Is something was changed when posting a new order for a domain that exceeded the weekly limitation for certificates per week?
Until now when we order more than 5 certificates for the same domain we got 429 error (which is fine and we knew how to deal with it) but now, we start getting 400 status code and “bad reply nonce” error.
Any idea why? is something was changed?
I can't think of a change that would have the affect you're describing. Can you share some timestamped logs of your requests & the responses?
Do you know why your system is issuing 5+ duplicate certificates within a week period? That seems like a separate problem that might be worth addressing.
Hi, Thanks for the help.
Our system doesn’t issuing 5 certificates… its a test that we are doing.
Im from certificate manager IBM cloud service and we are using LE to issue certificates to our customers.
One of the customers try to issue 5 certificates and the error he gets doesnt reflect the real failure reason (of more than 5 certs per week) so we investigate it and discover thats was the reason, but from lets encrypt we replied with 400 error and not 429 as it should be.
Great, that helps a lot. I was able to look at the logs and what I see is the correct 429 response being returned to your ACME client as a result of the POST to newOrder. Here's a breakdown:
10:15:34 UTC - the 5th finalize request arrives and succeeds.
10:16:36.672 UTC - a newOrder request arrives for the same set of names. We return the error:
"Error":"429 :: rateLimited :: Error creating new order :: too many certificates already issued for exact set of domains: test.e2e.certificate-manager.test.cloud.ibm.com: see https://letsencrypt.org/docs/rate-limits/"
10:16:36.898 UTC - another newOrder request arrives for the same set of names. It must have reused a nonce, or otherwise included the wrong nonce because the error returned is:
"Error":"400 :: badNonce :: JWS has an invalid anti-replay nonce: \"0001-3PNPNZKIG4Epljz-eJRnTbgtyS2bI7M7gKZpRKQeBU\""
10:18:05 UTC - another newOrder request arrives for the same set of names. The nonce is correct this time and a 429 is returned:
"Error":"429 :: rateLimited :: Error creating new order :: too many certificates already issued for exact set of domains: test.e2e.certificate-manager.test.cloud.ibm.com: see https://letsencrypt.org/docs/rate-limits/"
The pattern continues after that. After the first 429 response every 2nd newOrder request has a bad nonce and gets a bad nonce error.
Is it possible that the 429 response invokes a retry that uses the stale nonce? I would recommend taking a look at this area of your code. Adding logging of received/used nonces would also probably help.
Returning to the original question in your thread it looks to me like the 429 you expect is being returned (when the nonce is correct) and when the nonce is incorrect you get that error instead. Processing of requests nonces happens earlier than processing of rate limits.
If you wanted an easy way to try and test these conditions our Pebble test ACME server can be configured to reject good nonces as if they were bad based on chance. That might be helpful for you when you're looking at improving the retry mechanism.