It produced this output: {“type”:“urn:ietf:params:acme:error:badNonce”,“detail”:"JWS has an invalid anti-replay nonce: ,“status”:400}
My web server is (include version): SailsJS v0.12.14
The operating system my web server runs on is (include version): AWS Linux
My hosting provider, if applicable, is: AWS
I can login to a root shell on my machine (yes or no, or I don’t know): yes
I’m using a control panel to manage my site (no, or provide the name and version of the control panel): No
The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): Greenlock v2.6.7 (nodejs)
I have been seeing badNonce errors on my service when we updated to ACME v2 API. I figured it was due to the http client not reusing the HTTPS connection (JWS has an invalid anti-replay nonce when client behind NAT). I thought I had fixed it by configuring the http client to reuse connections, but that did not fix anything.
It would be helpful if I could get logs related to this domain. Could point towards a solution.
PS - We use http-01 validation, in case that help.
Node.js maintains several connections per server to make HTTP requests. This function allows one to transparently issue requests.
Does your server actually have multiple IP addresses (including IPv6)? Because multiple HTTP connections should otherwise never have this effect - you should always hit the same nonce service due to IP stickyness.
Even if you do have multiple IPv4 addresses, your default route should only use one, right?
That’s because I rolled back my service to a previous version which uses the v1 api
Still need to figure out what happens when I start using the v2 api.
phs.getpostman.com is just a proxy. It proxies the challenge call to the service that is actually requesting the certificate. On that service, all calls for a single certificate are made by a single instance.
And the failure always happens at the same step -
Call the newNonce API
Call the newOrder API with the nonce from Step 1. <-- This step fails, but not always
Each instance has a single network interface with a single IP v4 Public IP (no IP v6), and all outgoing calls are from the instances IP, no NAT.
My suspicion is that either the TLS connection is not being reused by the client, or it is creating multiple connections and using them independently. The error logs from Lets Encrypt would help in narrowing down what is actually happening.
@JuergenAuer that cannot happen as nonces are not shared between different servers or processes. Whenever we need to make a new request to Lets Encrypt, we first get a new nonce, and then use that nonce for the rest of the lifecycle of the request.
TLS connection reuse shouldn’t matter – or at least, it shouldn’t matter much. I’m not certain if it has some effect.
The software is using a different nonce for each HTTP request, right? Getting a certificate or whatever involves a number of HTTP requests to the ACME API server, and each nonce can only be used for one.
Edit:
Also, how many badNonce errors is it getting? It’s normal for them to happen occasionally, and ACME clients should retry using the new nonce.