Recently, I just got unusually a lot of “badNonce” error, something like this one:
[Tue Jul 17 17:26:25 UTC 2018] original='{
"type": "urn:acme:error:badNonce",
"detail": "JWS has invalid anti-replay nonce EVcunbtEQC7dr8S_71Qzi2bCvQnQ32hxG4hLfaVwY-8",
"status": 400
}'
In acme.sh, we have logic to wait 5 seconds and then retry to get a new nonce if we see this error.
But recently, it’s not working anymore. If we see this “badNonce”, we try to get a new nonce, then retry the operation, but we see the badNonce error again with the new nonce. Then we try to get a new nonce again, and then retry, and so on. We retry for 5 times in total, every 5 seconds.
But all the retries are failed, with “badNonce” error.
https://github.com/travis-ci/travis-ci/issues/9555 could be related. If you can record the apparent outbound IP address in the test suite, that could confirm or rule changing outbound NAT IP addresses as a cause.
Boulder does use the outbound IP to load-balance requests between Boulder instances.
Nonces are not shared between instances of Boulder, they are only valid for the Boulder instance that issued them to begin with.
So if two curl requests are load balanced to different Boulder instances, a seemingly valid nonce would fail.
If you send header in all of your HTTP requests:
Pragma: akamai-x-get-client-ip
You will be able to check the X-Akamai-Pragma-Client-IP response header to see if your outbound IP is changing between the request where you acquired the nonce, and the request where you used it.