Sustained badNonce Errors

elssar · February 26, 2019, 6:51am

My domain is: nonce2.prod.altrawcode.space

I ran this command: My client called the v2 newOrder api (https://acme-v02.api.letsencrypt.org/acme/new-order)

It produced this output: {“type”:“urn:ietf:params:acme:error:badNonce”,“detail”:"JWS has an invalid anti-replay nonce: ,“status”:400}

My web server is (include version): SailsJS v0.12.14

The operating system my web server runs on is (include version): AWS Linux

My hosting provider, if applicable, is: AWS

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): Greenlock v2.6.7 (nodejs)

I have been seeing badNonce errors on my service when we updated to ACME v2 API. I figured it was due to the http client not reusing the HTTPS connection (JWS has an invalid anti-replay nonce when client behind NAT). I thought I had fixed it by configuring the http client to reuse connections, but that did not fix anything.

It would be helpful if I could get logs related to this domain. Could point towards a solution.

PS - We use http-01 validation, in case that help.

_az · February 26, 2019, 7:32am

x-post #17 - Elevated `badNonce` and `malformed` errors when trying to issue certificates - acme.js-ARCHIVED - CoolAJ86 on GIT by OP

Node.js maintains several connections per server to make HTTP requests. This function allows one to transparently issue requests.

Does your server actually have multiple IP addresses (including IPv6)? Because multiple HTTP connections should otherwise never have this effect - you should always hit the same nonce service due to IP stickyness.

Even if you do have multiple IPv4 addresses, your default route should only use one, right?

JuergenAuer · February 26, 2019, 8:35am

Hi @elssar

now you have a new certificate:

CN=nonce2.prod.altrawcode.space
	26.02.2019
	27.05.2019
expires in 90 days	nonce2.prod.altrawcode.space - 1 entry

That looks good.

elssar · February 26, 2019, 9:03am

Hi @JuergenAuer,

That’s because I rolled back my service to a previous version which uses the v1 api
Still need to figure out what happens when I start using the v2 api.

JuergenAuer · February 26, 2019, 9:21am

You have two ipv4 - addresses ( https://check-your-website.server-daten.de/?q=nonce2.prod.altrawcode.space ):

Host	T	IP-Address	is auth.	∑ Queries	∑ Timeout
nonce2.prod.altrawcode.space	C	phs.getpostman.com	yes	1	0
	A	18.210.130.175	yes
	A	54.226.165.87	yes

If sometimes answers the 18.210., sometimes the 54.226., that's the problem.

elssar · February 26, 2019, 9:59am

phs.getpostman.com is just a proxy. It proxies the challenge call to the service that is actually requesting the certificate. On that service, all calls for a single certificate are made by a single instance.

And the failure always happens at the same step -

Call the newNonce API
Call the newOrder API with the nonce from Step 1. <-- This step fails, but not always

_az · February 26, 2019, 10:00am

Could you answer about what your outgoing network looks like?

This failure is seen when the source IP in (1) and (2) differs.

Usually this is only triggered by dual-stack setups or weird NATs.

JuergenAuer · February 26, 2019, 10:19am

The nonces are pooled.

So if your 18.* gets a nonce and the next request comes from 54.* with this nonce, the nonce is invisible because it's in the wrong pool

-->> error.

elssar · February 26, 2019, 11:12am

Each instance has a single network interface with a single IP v4 Public IP (no IP v6), and all outgoing calls are from the instances IP, no NAT.

My suspicion is that either the TLS connection is not being reused by the client, or it is creating multiple connections and using them independently. The error logs from Lets Encrypt would help in narrowing down what is actually happening.

elssar · February 26, 2019, 11:14am

@JuergenAuer that cannot happen as nonces are not shared between different servers or processes. Whenever we need to make a new request to Lets Encrypt, we first get a new nonce, and then use that nonce for the rest of the lifecycle of the request.

mnordhoff · February 26, 2019, 11:53am

TLS connection reuse shouldn’t matter – or at least, it shouldn’t matter much. I’m not certain if it has some effect.

The software is using a different nonce for each HTTP request, right? Getting a certificate or whatever involves a number of HTTP requests to the ACME API server, and each nonce can only be used for one.

Edit:

Also, how many badNonce errors is it getting? It’s normal for them to happen occasionally, and ACME clients should retry using the new nonce.

elssar · February 26, 2019, 12:33pm

@mnordhoff yes, it uses the nonce from the previous request in each subsequent request.

The number of badNonce requests is significant, a pretty high fraction of the certificate requests we make.

system · March 28, 2019, 12:33pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Increase “JWS has an invalid anti-replay nonce” Errors Help	20	689	January 17, 2024
JWS has an invalid anti-replay nonce when provisioning new certificates Help	4	1329	September 4, 2019
JWS has invalid anti-replay nonce (using LetsEncryptSimple aka WinSimple) Help	8	3644	January 12, 2018
Trying to understand urn:acme:error:badNonce Help	20	20804	November 7, 2019
JWS has invalid anti-replay nonce since tonight June 4th Help	16	4626	August 13, 2017

Sustained badNonce Errors

Related topics