Intermittent badNonce errors

An update on my dig thus far…

I think In order to understand what’s going on here it’s important to understand how this system I’m working on processes requests for certificates. Essentially the entire thing is done asynchronously with message queues. The process has eight or nine queues however only three of them deliver messages to pollers that actually communicate with Let’s Encrypt. Those three pollers each call one Let’s Encrypt API endpoint. They are: new-authz, authz verification, and new-cert.

The way AcmeClient works is that when it goes to send a request to Let’s Encrypt it checks its in memory storage of nonces for a nonce and uses it. If there aren’t any nonces it makes a request to Let’s Encrypt essentially to grab a nonce out of the response that that can be used for the request. Let’s Encrypt gives an appropriate response and the nonce received in that response is stored.

Because the request happens a single time and because a message is then passed to the next poller to handle the next step the nonce that was received in the response from Let’s Encrypt does not get used until a new message is received by that poller. [I hope I’m making sense here].

All of this led me to believe that the nonces that I wrote about yesterday must have expired over the three hours between when they were received and when they were used again. Then I saw a comment on another thread from jsha where he clearly states that nonces do not expire due to some passage of time. He says that nonces can go “bad” because they’ve already been used or “any time Boulder’s frontend gets restarted, existing nonces are no longer valid.”

So this leaves me wondering, were those nonces that I got errors for used in a previous request (it doesn’t look like it based on the info you provided cpu) or was the Boulder “frontend” (not sure what that refers to) restarted yesterday between 12:22UTC and 15:30UTC?