Sustained badNonce Errors


#1

My domain is: nonce2.prod.altrawcode.space

I ran this command: My client called the v2 newOrder api (https://acme-v02.api.letsencrypt.org/acme/new-order)

It produced this output: {“type”:“urn:ietf:params:acme:error:badNonce”,“detail”:"JWS has an invalid anti-replay nonce: ,“status”:400}

My web server is (include version): SailsJS v0.12.14

The operating system my web server runs on is (include version): AWS Linux

My hosting provider, if applicable, is: AWS

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): No

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): Greenlock v2.6.7 (nodejs)

I have been seeing badNonce errors on my service when we updated to ACME v2 API. I figured it was due to the http client not reusing the HTTPS connection (JWS has an invalid anti-replay nonce when client behind NAT). I thought I had fixed it by configuring the http client to reuse connections, but that did not fix anything.

It would be helpful if I could get logs related to this domain. Could point towards a solution.

PS - We use http-01 validation, in case that help.


#2

x-post https://git.coolaj86.com/coolaj86/acme-v2.js/issues/17 by OP

Node.js maintains several connections per server to make HTTP requests. This function allows one to transparently issue requests.

Does your server actually have multiple IP addresses (including IPv6)? Because multiple HTTP connections should otherwise never have this effect - you should always hit the same nonce service due to IP stickyness.

Even if you do have multiple IPv4 addresses, your default route should only use one, right?


#3

Hi @elssar

now you have a new certificate:

CN=nonce2.prod.altrawcode.space
	26.02.2019
	27.05.2019
expires in 90 days	nonce2.prod.altrawcode.space - 1 entry

That looks good.


#4

Hi @JuergenAuer,

That’s because I rolled back my service to a previous version which uses the v1 api :slight_smile:
Still need to figure out what happens when I start using the v2 api.


#5

You have two ipv4 - addresses ( https://check-your-website.server-daten.de/?q=nonce2.prod.altrawcode.space ):

Host T IP-Address is auth. ∑ Queries ∑ Timeout
nonce2.prod.altrawcode.space C phs.getpostman.com yes 1 0
A 18.210.130.175 yes
A 54.226.165.87 yes

If sometimes answers the 18.210., sometimes the 54.226., that’s the problem.


#6

phs.getpostman.com is just a proxy. It proxies the challenge call to the service that is actually requesting the certificate. On that service, all calls for a single certificate are made by a single instance.

And the failure always happens at the same step -

  1. Call the newNonce API
  2. Call the newOrder API with the nonce from Step 1. <-- This step fails, but not always

#7

Could you answer about what your outgoing network looks like?

This failure is seen when the source IP in (1) and (2) differs.

Usually this is only triggered by dual-stack setups or weird NATs.


#8

The nonces are pooled.

So if your 18.* gets a nonce and the next request comes from 54.* with this nonce, the nonce is invisible because it’s in the wrong pool

–>> error.


#9

Each instance has a single network interface with a single IP v4 Public IP (no IP v6), and all outgoing calls are from the instances IP, no NAT.

My suspicion is that either the TLS connection is not being reused by the client, or it is creating multiple connections and using them independently. The error logs from Lets Encrypt would help in narrowing down what is actually happening.


#10

@JuergenAuer that cannot happen as nonces are not shared between different servers or processes. Whenever we need to make a new request to Lets Encrypt, we first get a new nonce, and then use that nonce for the rest of the lifecycle of the request.


#11

TLS connection reuse shouldn’t matter – or at least, it shouldn’t matter much. I’m not certain if it has some effect.

The software is using a different nonce for each HTTP request, right? Getting a certificate or whatever involves a number of HTTP requests to the ACME API server, and each nonce can only be used for one.

Edit:

Also, how many badNonce errors is it getting? It’s normal for them to happen occasionally, and ACME clients should retry using the new nonce.


#12

@mnordhoff yes, it uses the nonce from the previous request in each subsequent request.

The number of badNonce requests is significant, a pretty high fraction of the certificate requests we make.


closed #13

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.