Surge in "JWS has an invalid anti-replay nonce" Errors

Hey there,

over the past 20 minutes, we have observed a surge in Cert Renew failures on our Servers, all of which are accompanied by the following error message:

Status 400
{
  "type": "urn:ietf:params:acme:error:badNonce",
  "detail": "JWS has an invalid anti-replay nonce: \"xxxxxxxxx\"",
  "status": 400
}

Since there were no changes on our infrastructure / our Scripts --> This leads us to ponder the potential involvement of the Let's Encrypt (LE) server infrastructure.

Anyone else observing those failures?

thx, bye from Austria
Andreas

4 Likes

Certainly possible that they're doing maintenance or otherwise having issues with their nonce infrastructure. In general, though, clients should just be retrying on getting an invalid nonce error. What client are you using? Is it retrying a bunch and getting that error multiple times?

4 Likes

We are using an extensively customized iteration of lescript within our operational framework.
The instances of renewal errors currently encountered do not impede the functionality of the active certificates, as the existing valid certificates remain effective, and the renewal process will be reattempted at a subsequent juncture.
My intention in highlighting this matter is to raise awareness, in the possible event of issues with the Let's Encrypt servers, given that such behavior diverges from the anticipated norm. (At least I haven't noticed such a huge surge of those Errors out of maintenance windows the last years)
There are currently no scheduled maintenance activities on the Let's Encrypt side; refer to https://letsencrypt.status.io/.

1 Like

Sorry for the trouble; the issue with cross-datacenter nonce redemption is resolved. I should have filed a status.io notice when I saw it, though the redemption failure rate was (in absolute terms) still really low, only affecting when requests ping-pong between multiple datacenters, and in fact only affecting one direction.

Anyway, again, sorry for not communicating it.

8 Likes

hey @jcjones,
thank you for the insight and the resolution for the matter. The failure rates have already shown signs of normalization on our side, and the renewal process is now achieving successful outcomes once more. :blush:
Thank you, and best regards from Austria.
Andy

7 Likes

I've added it to the status.io log for posterity, too: Let's Encrypt Status

7 Likes

Datacenter 1:


Datacenter 2:

The Y axis is in nonces/sec being rejected as invalid, vs. the steady state of ~2000/sec per datacenter being generated.

7 Likes

Valid redemptions were generally still OK, which is my bad defense for not filing the status.io right away.

Datacenter 1:

Datacenter 2:

7 Likes

hehe, guess no defense needed :wink:
Those are really interesting insights, thank you for sharing them with us! :slight_smile:

8 Likes

@futureweb Kudo's for having such a system detecting these kind of things!

8 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.