@mcpherrinm Thanks for getting back on this!
That's interesting! I was referring to these old comments earlier:
- Regular badNonce errors - #4 by cpu
- JWS anti-replay nonce error · Issue #1 · bruncsak/ght-acme.sh · GitHub
So is this no longer the case today? This would bring more mysterious to this problem, though...
There’s only one case I know of where nonce redemption should fail, which is if a nonce server restarts.
Based on the fact that just disabling the connection pool made a significant difference, I doubt that's the case here. But that's good to know, thanks for sharing that. Appreciate your transparency.
Can you characterize what “intermittent” means any better? If we can show it happened outside of times we restarted nonce servers, then perhaps we need to investigate further.
As an example, requests to /acme/new-order was failing -30% ratio (UPDATE: This number was not accurate. This had to be lower. But as we call multiple ACME endpoints throughout the new order process, we saw the badNonce error across endpoints, and that rolled up to a total ~30% failure in the overall process) with no clear characteristic that would indicate server restarts or as such. This went down to 0% after disabling the connection pool.
We'd love to provide further data if we can discuss this privately over email or something.
Thanks!