Acme.sh standalone fails multiple validation requests (staging multi-va)

This issue happens only in the staging server, the production server is ok.

Hi @cpu ,

I believe this happens from the changes from: Validating challenges from multiple network vantage points

When testing with acme.sh, I got the following log:

[Wed Aug 30 21:42:36 CST 2017] The new-authz request is ok.
[Wed Aug 30 21:42:36 CST 2017] Verifying:xxxxxx.com
[Wed Aug 30 21:42:36 CST 2017] Standalone mode server
GET /.well-known/acme-challenge/32dXBlBwSAAhioWWdXMcLCfiotV2I6-DCyNME0fjuwk HTTP/1.1
Host: xxxx.com
User-Agent: Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)
Accept: */*
Accept-Encoding: gzip
Connection: close

[Wed Aug 30 21:42:41 CST 2017] xxxxx.com:Verify error:Fetching http://xxxx.com/.well-known/acme-challenge/32dXBlBwSAAhioWWdXMcLCfiotV2I6-DCyNME0fjuwk: Connection reset by peer

From the log, we can see the letsencrypt CA server accessed the standalone server, and we give the correct challenge to it.

but the CA still fails for the error : “Connection reset by peer”.

Yes, I know that the standalone server may close the connection forcely(maybe TCP RST, instead of graceful TCP FIN). but the verification challenge has been successfully echoed back.

So the verification should success. just same like the production does.

Thanks.

Hi @Neilpang,

Thanks for the report. I’ll look into this.

@cpu
thank you in advance.

Hi @Neilpang,

I have checked some of our logs and it appears only one of the validation servers are able to fetch the challenge response when trying to issue using acme.sh in standalone against staging. Upon a quick look in acme.sh, it appears the netcat closes after sending off a response to one validation server.

With the “multi-va” approach we have added in staging, a client will receive multiple validation requests and be expected to respond to each.

Hopefully that is helpful.

Ah, I had worried this might surprise somebody doing something tricksy like this, at least Neilpang must be commended for testing his code against staging and not waiting until some poor soul finds it with their production systems.

Is it practical to have a distinct error or error affix (e.g. “… for multi-va”) for the scenario where one check succeeds but the rest fail? The use of a quorum is a good idea, and obviously it’s useless unless it’s mandatory, but a particular error text would direct users who run into this (some of whose clients might have less conscientious authors than Neilpang) to the right answers on this forum, skipping diagnostics that aren’t relevant to their problem and so increasing the chance of a good outcome.

Thanks for the suggestion. We could potentially suffix the problem detail with something that indicates the error was observed by a remote VA and not the traditional VA - I'll think about that a little bit. The way we've implemented this feature presently means that all of the individual errors are not available and we return whichever error broke the quorum. That might make some ideas less feasible.

From the little testing I have done, it appears that it is not uncommon for one of the remote-va nodes to be the first to fetch the challenge. I saw this in at least one case.

@andygabby @cpu @tialaramex

Thank you all guys.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.