Http-01 doesn't get response from all endpoints in round-robin ns-record

rnz · August 18, 2021, 9:54am

When trying to renew the certificate, the response check is performed only for one node from the list in the round-robin ns-record (balanced), and if the .well-known/acme-challenge/ node under test does not contain a response file, then 404 Not Found, as a result, obtaining the certificate is fails.

I have workaround - I stop keepalived on all proxies and all IP's migrate to a single host and after update certificate again - but this is ugly way - and this is need hands actions.

Expected: all IPs in round-robin ns-record is used for check response for acme-challenge before return error code if not found response.

My domain is: robofinist.ru

I ran this command: /usr/local/bin/dehydrated -c

It produced this output:

root@p01-proxy-corp:~# dehydrated -c
# INFO: Using main config file /etc/dehydrated/config
Processing robofinist.ru with alternative names: demo.robofinist.ru alpha.robofinist.ru
 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till Aug 18 05:11:19 2021 GMT (Less than 30 days). Renewing!
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 3 authorizations URLs from the CA
 + Handling authorization for alpha.robofinist.ru
 + Handling authorization for demo.robofinist.ru
 + Handling authorization for robofinist.ru
 + 3 pending challenge(s)
 + Deploying challenge tokens...
 + Responding to challenge for alpha.robofinist.ru authorization...
 + Cleaning challenge tokens...
 + Challenge validation has failed :(
ERROR: Challenge is invalid! (returned: invalid) (result: ["type"]	"http-01"
["status"]	"invalid"
["error","type"]	"urn:ietf:params:acme:error:unauthorized"
["error","detail"]	"Invalid response from http://alpha.robofinist.ru/.well-known/acme-challenge/JOwv7Cw7rYCQdxuDirvtcK7F1xYG7rYfNjuC9hP2KIk [185.129.96.83]: \"\u003chtml\u003e\\r\\n\u003chead\u003e\u003ctitle\u003e404 Not Found\u003c/title\u003e\u003c/head\u003e\\r\\n\u003cbody bgcolor=\\\"white\\\"\u003e\\r\\n\u003ccenter\u003e\u003ch1\u003e404 Not Found\u003c/h1\u003e\u003c/center\u003e\\r\\n\u003chr\u003e\u003ccenter\u003e\""
["error","status"]	403
["error"]	{"type":"urn:ietf:params:acme:error:unauthorized","detail":"Invalid response from http://alpha.robofinist.ru/.well-known/acme-challenge/JOwv7Cw7rYCQdxuDirvtcK7F1xYG7rYfNjuC9hP2KIk [185.129.96.83]: \"\u003chtml\u003e\\r\\n\u003chead\u003e\u003ctitle\u003e404 Not Found\u003c/title\u003e\u003c/head\u003e\\r\\n\u003cbody bgcolor=\\\"white\\\"\u003e\\r\\n\u003ccenter\u003e\u003ch1\u003e404 Not Found\u003c/h1\u003e\u003c/center\u003e\\r\\n\u003chr\u003e\u003ccenter\u003e\"","status":403}
["url"]	"https://acme-v02.api.letsencrypt.org/acme/chall-v3/22906790310/iFCR1w"
["token"]	"JOwv7Cw7rYCQdxuDirvtcK7F1xYG7rYfNjuC9hP2KIk"
["validationRecord",0,"url"]	"http://alpha.robofinist.ru/.well-known/acme-challenge/JOwv7Cw7rYCQdxuDirvtcK7F1xYG7rYfNjuC9hP2KIk"
["validationRecord",0,"hostname"]	"alpha.robofinist.ru"
["validationRecord",0,"port"]	"80"
["validationRecord",0,"addressesResolved",0]	"185.129.96.83"
["validationRecord",0,"addressesResolved",1]	"185.129.96.84"
["validationRecord",0,"addressesResolved",2]	"185.129.96.82"
["validationRecord",0,"addressesResolved"]	["185.129.96.83","185.129.96.84","185.129.96.82"]
["validationRecord",0,"addressUsed"]	"185.129.96.83"
["validationRecord",0]	{"url":"http://alpha.robofinist.ru/.well-known/acme-challenge/JOwv7Cw7rYCQdxuDirvtcK7F1xYG7rYfNjuC9hP2KIk","hostname":"alpha.robofinist.ru","port":"80","addressesResolved":["185.129.96.83","185.129.96.84","185.129.96.82"],"addressUsed":"185.129.96.83"}
["validationRecord"]	[{"url":"http://alpha.robofinist.ru/.well-known/acme-challenge/JOwv7Cw7rYCQdxuDirvtcK7F1xYG7rYfNjuC9hP2KIk","hostname":"alpha.robofinist.ru","port":"80","addressesResolved":["185.129.96.83","185.129.96.84","185.129.96.82"],"addressUsed":"185.129.96.83"}]
["validated"]	"2021-08-18T09:14:07Z")

My web server is (include version): nginx/1.14.1

The operating system my web server runs on is (include version): Debian GNU/Linux 9.13 (stretch)

I can login to a root shell on my machine (yes or no, or I don't know): yes

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot):

# dehydrated -v
# INFO: Using main config file /etc/dehydrated/config
Dehydrated by Lukas Schauer
https://dehydrated.io

Dehydrated version: 0.7.1
GIT-Revision: unknown

OS: Debian GNU/Linux 9 (stretch)
Used software:
 bash: 4.4.12(1)-release
 curl: 7.52.1
 awk: mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
 sed: sed (GNU sed) 4.4
 mktemp: mktemp (GNU coreutils) 8.26
 grep: grep (GNU grep) 2.27
 diff: diff (GNU diffutils) 3.5
 openssl: OpenSSL 1.1.0l  10 Sep 2019

rg305 · August 18, 2021, 10:38am

I see.

Maybe you could insert an included file that contains the acme challenge location in each system.
You could have multiple includes and rotate them based on time of day or day of week, etc.
Each file could redirect to a predefined FQDN that resolves to only one specific IP from that list.
And one file could handle the requests locally.

I hope I've explained myself clearly.
The idea is have all the systems "point" to one specific IP at a given point in time.
And then rotate which IP gets all the "attention" as required - even triggered based on cert issuance.

Hopefully that will be less "ugly" LOL

rnz · August 18, 2021, 11:03am

Maybe you could insert an included file that contains the acme challenge location in each system.

This is also a bad variant then it is required to provide access of one node to all. Currently the logic is reversed, one updates the certificates, the rest of the nodes are taken certs from one.

You could have multiple includes and rotate them based on time of day or day of week, etc.
Each file could redirect to a predefined FQDN that resolves to only one specific IP from that list.

Looks like over-complication for the sake of complication. Besides, I talking about http-01, not dns-01. Otherwise, I would modify ns-records on nameserver.

rg305 · August 18, 2021, 4:18pm

Then you only need to insert redirection codes to the other nodes and they will all forward their requests to the one single node.

Perhaps; but I guess I didn't fully understand your setup and actual goal.

So am I.
The redirection happens in HTML (not in DNS).
And the authentication remains in HTTP-01.

rnz · August 18, 2021, 5:30pm

I understand you, but your proposal requires to make separate settings for each proxy host and support this separation by the external configuration tools (ansible or same), for example for adding more proxy nodes. Whereas currenty all proxy hosts settings are identical.

In my opinion it is a mistake to shift the server task to the client. If the program on a server side receives round-robind ns-record, then it will be wrong to fail without checking all endpoints from ns-record for response existing.

rg305 · August 18, 2021, 5:43pm

Do you?
I'm proposing that you simply use the exact same redirection include location in all of the systems.
Without having to add any additional systems (like more proxy nodes).

I'm not sure that is even debatable.
There are rules set forth by the CA/B forum and LE follows them to the T.

jvanasco · August 18, 2021, 5:46pm

If I understand your setup correctly, I have many systems that run this scenario.

I handle this by having all replicant nodes proxy the ./well-known/acme-challenge directory onto the main node. If they are in different data-centers/networks, you can also do an HTTP redirect to a hostname assigned to the main node - however a proxy will still work (it will just be slower). The ACME protocol supports following redirects, and LetsEncrypt will do so for validation.

jvanasco · August 18, 2021, 5:54pm

I believe ACME/LetsEncrypt will only test one IP if multiple are available. Testing is now done from multiple, unknown, vantages - so I do not believe you can expect the same IP to be used by each validation service.

If ALL IPs were to be tested in a round-robin scenario, the expected result for many (most?) users would be that every IP must pass validation, and the first failing IP will fail the entire validation attempt.

rnz · August 18, 2021, 6:04pm

Yes I do. Additional proxy nodes need for other tasks not for this, that is examle.
You proposing is right if I add additional host only for cert generation and redirect to it from all proxy. It can be done, but it introduces additional points of failure.

rnz · August 18, 2021, 6:05pm

Now I not understand you. )
What does it mean "T"?

rg305 · August 18, 2021, 6:08pm

No additional anything required.
You can use one of the proxies to obtain the cert(s).

Its just an expression:

rnz · August 18, 2021, 6:10pm

It would be correct if it was about checking for IP, but checking is performed for DN which have more then one IP

rg305 · August 18, 2021, 6:14pm

Your logic is: Check all IPs and stop once anyone of them passes.

The applied logic is: Check "the first IP" and pass or fail on that one test.
And then that same "test'' is applied from multiple separate points on the Internet.
And ALL the tests must pass.

Presuming four IPs are included in the round-robin scenario.
You have a 1/4 chance to pass any of the tests.
But since all must be passed, they multiply to (1/4 * 1/4 * 1/4):
1/64 chance of success.

rnz · August 18, 2021, 6:21pm

But this proxy host will be have different configuration differ with others proxy and it will have additional DN with single IP in ns-record for it, also that will have potential problem with CORS.

rg305 · August 18, 2021, 6:25pm

Not necessarily. Each proxy can be configured identically.
Yes, there needs to be an additional FQDN that resolves only to the proxy IP that will be handling the certificate validation. But that IP can be changed in DNS and all proxies should be able to obtain a cert.
I don't see how CORS would be involved unless there is an added load-balancer that somehow respreads the load without following the destination IP.

rnz · August 18, 2021, 6:45pm

How many separate points?

3 points? Does each of them check separately? Why need to multiply passes? Why all must be passed.

rnz · August 18, 2021, 6:52pm

I will try. But I think the configuration will become more dependent on specific fqdn instead of the unified approach used now.

Pool of proxy already have more then one tld fqdn and have load-balancer too.

jvanasco · August 18, 2021, 6:55pm

When a round-robin DNS scenario is involved, the long-standing practice is to use a http redirect to a single point of authorization, or to proxy traffic to a single shared node.

Reiterating what @rg305 said, your belief is that "any single element may pass", but the current standard across the ACME spec with most 'multiple validation' sections – and with most internet security protocols – is that "each and every element must pass".

This actually makes me wonder if LetsEncrypt/ACME is handling this securely. My expectation has always been that each and every disclosed IP in a round-robin scenario MUST be tested, but I checked the RFC and that states only "at least one" selected at the server's discretion must be tested (in round-robin scenarios). Only testing one appears like a bit of a potential security issue by design to me; this would allow attackers the potential exploit a single compromised machine in the cluster – such as a replicant with lower security measures in place.

In any event, my point is the OP's expected behavior does not authoritatively prove control of a domain in a round robin scenario. If the persons who control the domain specified multiple round-robin IPs for that domain, all IPs should successfully validate the challenge.

rmbolger · August 18, 2021, 9:45pm

I'm guessing this is mostly mitigated when combined with multi-perspective validation where each validator chooses a random one from the round-robin list to test. I don't know for sure if that's how the validators work, but it seems like the easiest implementation option.

jvanasco · August 18, 2021, 10:06pm

"somewhat" would be a better word there. I've seen a handful of implementations that exceed the number of multi-perspective validations in use right now. This likely isn't a concern for domains that only have 2 entries though.

Topic		Replies	Views
HTTP-01 challenge failed Help	8	736	April 19, 2022
Renewal: Invalid response from '.well-known/acme-challenge' 404 but letsdebug is ok Help	4	2637	April 9, 2020
Acme server sends immediately a reset during http-1 challenge Help	3	1061	July 1, 2021
Http-01 challenge failure Help	10	2930	July 18, 2021
DNS-01 pending state stuck Help	4	1364	July 28, 2021

Http-01 doesn't get response from all endpoints in round-robin ns-record

Related topics