Attempting HTTP challenge behind AWS cloudfront and AWS network load balancer

jaaasshh · August 19, 2022, 9:11pm

I ran this command:
Using traefik's integration with let's encrypt. I ran no command directly.

My web server is (include version):
traefik 2.8.3

The operating system my web server runs on is (include version):
amazon linux 2

I can login to a root shell on my machine (yes or no, or I don't know): yes

I'm using a control panel to manage my site (no, or provide the name and version of the control panel):

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you're using Certbot): NA, using traefik

Environment:
AWS Cloudfront -> AWS Network Load Balancer -> traefik (traefik:v2.8.3 docker containers)

I'm getting this error:

Cannot retrieve the ACME challenge for ninja-external-6c54d4df686a3499.elb.us-west-2.amazonaws.com (token \"_VUVT-UK84_sTe5Jh6TjMeFZ_8ajaMN7OTTgg8_r96I\"): cannot find challenge for token \"_VUVT-UK84_sTe5Jh6TjMeFZ_8ajaMN7OTTgg8_r96I\" (ninja-external-6c54d4df686a3499.elb.us-west-2.amazonaws.com)

I think it's because LE is making a request to ctm-dr.com (cloudfront) which makes a request to the origin (the amazonaws.com domain), and traefik sees the request as if it's on that domain and not ctm-dr.com -- so it's throwing the above error.

At least I think that's what's going on...?

So, question - is it even possible to do a HTTP challange given the above setup? Is the domain translation jacking everything up?

I've looked high and low for some configuration that might "forward" the originating domain through the load balancer, but i've yet to see something like that. If there is, I'd love to try it out.

Any ideas? Thank you!

PS. Double posted this "issue" to the traefik forums here. A bit more traefik specific information there, but I think this is more an AWS/let's encrypt issue, than a traefik issue:

Bruce5051 · August 19, 2022, 9:24pm

Hello @jaaasshh, welcome to the Let's Encrypt community.

And SSL Labs on one of your IP Addresses:
https://www.ssllabs.com/ssltest/analyze.html?d=www.ctm-dr.com&s=2600%3A9000%3A2202%3Abc00%3A4%3Aaa23%3A1f80%3A93a1&latest

I find no IP Address for your domain name.

ping ctm-dr.com
ping: ctm-dr.com: No address associated with hostname

But I do find IP Address for www.ctm-dr.com


ping www.ctm-dr.com
PING d4sj8ha9fg48u.cloudfront.net (99.84.66.57) 56(84) bytes of data.
64 bytes from server-99-84-66-57.hio50.r.cloudfront.net (99.84.66.57): icmp_seq=1 ttl=245 time=15.1 ms
64 bytes from server-99-84-66-57.hio50.r.cloudfront.net (99.84.66.57): icmp_seq=2 ttl=245 time=13.4 ms

MikeMcQ · August 19, 2022, 9:25pm

Walking out the door so sorry this not fully thought out but

Can you use a DNS challenge from traefik (instead of http)?
Do you need to use HTTPS rather than HTTP between the AWS LB and Traefik?
Could Traefix be "tricked" by sending your domain in the HOST request header? I'm not sure what options the AWS Network LB offers but I know Cloudfront can do things with the HOST header.

Bruce5051 · August 19, 2022, 9:43pm

You are getting Response Header: HTTP/1.1 502 Bad Gateway

Using Redirect Checker | Check your Statuscode 301 vs 302
I see:

And from SSL Server Test: www.ctm-dr.com (Powered by Qualys SSL Labs)

rg305 · August 19, 2022, 11:24pm

As mentioned: There is no IP for ctm-dr.com

CloudFront has a valid certificate for (only) www.ctm-dr.com

As verified by:
openssl s_client -connect www.ctm-dr.com:443 -servername ctm-dr.com
openssl s_client -connect www.ctm-dr.com:443 -servername www.ctm-dr.com

Do you need a cert within the docker container (within their system)?

jaaasshh · August 22, 2022, 10:50am

This is expected. I only have a CNAME setup pointing www.ctm-dr.com to cloudfront.

For a bit of context as to why, I'm explicitly testing "whitelabeling" our application wherein a customer would point a subdomain at our infrastructure in order to re-sell our product under their brand. "www.ctm-dr.com" is that subdomain that I am testing.

jaaasshh · August 22, 2022, 10:58am

Thanks for the reply!

Not without some added difficulty. As stated in my previous post, I won't have full DNS control of the TLD here. Just the single subdomain. I think the http challenge is the simplest logistically because it only requires our customers to set a single CNAME record - then we can do the rest. At least, that's the plan

I don't think so? treafik is set up to respond on both 443 and 80, but the LB is a network LB, so there's no HTTP/HTTPS configuration there - it just passes through the traffic as it's configured (and I have it configured to forward 80 to 80 and 443 to 443 - nothing fancy)
Also, I think the http challenge requires 80 to be open all the time?
[/quote]

This... I dunno. I think Cloudfront sets X-Forwarded-Host, and you'd think traefik would respect that if present. Will see if I can get CF to also set Host - can't hurt.

jaaasshh · August 22, 2022, 11:00am

This is expected. I'm trying to issue a cert only for www.ctm-dr.com.

I'm not 100% sure I follow you. I think yes? I'm attempting to issue the cert in traefik because that's where I want to do ssl-termination.

jaaasshh · August 22, 2022, 11:48am

UPDATE:
I had caching enabled in Cloudfront - and have now disabled that. I've also set the "Origin Request Policy" to "All Viewer" which I believe is supposed to just forward everything.

I'm now no longer seeing any reference to the LB domain in my logs -- but, I still see lots of Error getting challenge for token retrying in 2.583049201s

So, possible step in the right direction, but not quite there yet?

AWS Console:

MikeMcQ · August 22, 2022, 12:31pm

No, that's not what those options mean. You should review the CloudFront docs.

You have a very complex (even convoluted) architecture. A CDN in front of a Network Load Balancer in front of Traefik. Getting that to work is well beyond the scope of this forum. Just getting a cert using the HTTP challenge I think will be very difficult for you. A DNS challenge for Route53 takes some care too but probably easier than the HTTP challenge for your chosen architecture.

I will explain a simple CloudFront case to clarify how the certs work.

A client, say a browser, requests your domain name www.ctm-dr.com. It may try with HTTP or HTTPS and CF can optionally auto-redirect to HTTPS. In fact, you have CloudFront redirecting HTTP to HTTPS right now.

So, CF terminates the original HTTPS request and needs its own cert for this purpose. You can see the cert AWS has for this using a tool like this.

CF then connects to your Origin Server(s). It can use HTTP or HTTPS or even match the method the browser used. If using HTTPS, this is a second HTTPS connection which has its own cert in your Origin Server. Your Origin Server does not need to have the same domain name as the inbound request. And, in fact, often is not for example when using S3 as an origin.

Right now your CF redirects HTTP to HTTPS. But, the connection to the origin fails and gives a 502 http error. See AWS 502 debug info

(many headers omitted for clarity)
curl -I www.ctm-dr.com
HTTP/1.1 301 Moved Permanently
Server: CloudFront
Location: https://www.ctm-dr.com/

(following redirect manually for test)
curl -I https://www.ctm-dr.com
HTTP/2 502
server: CloudFront
x-cache: Error from cloudfront

jaaasshh · August 22, 2022, 12:37pm

I understand everything you said, however, I think you missed that CF doesn't redirect http://www.ctm-dr.com/.well-known/* to https. I have an explicit behavior configured for this path in CF.

It forwards that request to the LB and then to traefik as http all the way through. This should allow the http challenge to work. No?

MikeMcQ · August 22, 2022, 12:48pm

Yes, fair, I did not try the acme challenge URL. But, I can't reach anything using a test acme challenge. Nor can Let's Debug test site.

curl -I -m 10 http://www.ctm-dr.com/.well-known/acme-challenge/Mike123
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received

So, it looks like your CF Behavior works right but something behind that is not.

jaaasshh · August 22, 2022, 12:50pm

So I noticed that. And I think that's expected...?? That seems to happen whenever I do what you did and curl some arbitrary acme-challenge Mike123

time="2022-08-22T12:42:40Z" level=error msg="Cannot retrieve the ACME challenge for token Mike123: cannot find challenge for token Mike123" providerName=acme
10.20.10.50 - - [22/Aug/2022:12:41:47 +0000] "HEAD /.well-known/acme-challenge/Mike123 HTTP/1.1" 404 0 "-" "-" 2028 "acme-http@internal" "-" 53382ms

... I mean that's just certbot/letsencrypt dorking around 53 seconds trying to find a "Mike123" challenge token - which definitely doesn't exist.

MikeMcQ · August 22, 2022, 12:52pm

No. The HTTP challenge relies on a running webserver to reply to the URL. It's no different than a browser asking for an html page or whatever.

So, given Mike123 does not exist, a very fast http 404 is expected.

jaaasshh · August 22, 2022, 1:02pm

Then what's all this? Is this cerbot (still assuming traefik uses certbot under the hoodd) output? Or is it more likely some traefik logs?

time="2022-08-22T12:56:30Z" level=error msg="Error getting challenge for token retrying in 10.04149151s" providerName=acme
time="2022-08-22T12:56:34Z" level=error msg="Error getting challenge for token retrying in 11.826312625s" providerName=acme
time="2022-08-22T12:56:39Z" level=error msg="Error getting challenge for token retrying in 302.57813ms" providerName=acme
time="2022-08-22T12:56:39Z" level=error msg="Error getting challenge for token retrying in 779.217737ms" providerName=acme
time="2022-08-22T12:56:40Z" level=error msg="Error getting challenge for token retrying in 18.232927934s" providerName=acme
time="2022-08-22T12:56:40Z" level=error msg="Error getting challenge for token retrying in 1.202932949s" providerName=acme
time="2022-08-22T12:56:41Z" level=error msg="Error getting challenge for token retrying in 2.500183347s" providerName=acme
time="2022-08-22T12:56:44Z" level=error msg="Error getting challenge for token retrying in 2.952985805s" providerName=acme
time="2022-08-22T12:56:46Z" level=error msg="Error getting challenge for token retrying in 19.876821786s" providerName=acme
time="2022-08-22T12:56:47Z" level=error msg="Error getting challenge for token retrying in 2.010148969s" providerName=acme

That looks to me like certbot is attempting to fetch something from lets encrypt, failing and retrying.

These logs happen after making a request to /.well-known/acme-challenge/foo

jaaasshh · August 22, 2022, 1:04pm

I'm starting to think that traefik isn't correctly sharing (amongst itself, there are 3 instances) whatever token it generates when it initiates a certificate request.

MikeMcQ · August 22, 2022, 1:13pm

I thought Traefik had its own ACME client. Not sure certbot is involved at all. That log looks like a Traefik config problem. Have you tried asking on their forum?

An actual HTTP challenge starts with an ACME client creating a file (*) accessible to your webserver. The client then asks Let's Encrypt server to verify the domain and the LE Server sends a request for that file. The webserver should respond with the correct contents and the cert is issued. (Certbot is just one of many ACME clients)

When we do test curl's it is just the same format for the challenge URL. We expect them to fail with 404. There is nothing magical about the path in the URL.

(*) I say "file" and this is often the case. But, it does not have to be a file and can be a static value the webserver knows about.

MikeMcQ · August 22, 2022, 1:16pm

Ah, yes. Likely a problem (maybe not only one). I don't know how Traefik manages that. Their forum is better source.

There are many threads in this forum that discuss multi-instance setups. They won't directly pertain to Traefik but may help conceptualize. One such thread:

rg305 · August 22, 2022, 3:49pm

That definitely sounds like an issue.
You should ask in the traefik help channels OR switch to DNS authentication.

jaaasshh · August 24, 2022, 7:21pm

Quick thanks to everyone who helped out here!

I think originally I had several issues mostly pertaining to routing port 80 through my infrastructure to traefik.

The "big" confusing issue was the result of how traefik stores certificates when you configure more than one resolver (DNS and HTTP). The fix was to direct traefik to store those certificates in different files since it seems to conflict otherwise.

Traefik config below - note that storage is different in the two resolvers.

[certificatesResolvers.letsencrypt.acme]
  email = "foo@email.com"
  storage = "/acme/acme.json"
  [certificatesResolvers.letsencrypt.acme.dnsChallenge]
    provider = "route53"
    delayBeforeCheck = 0
    resolvers = ["1.1.1.1:53", "8.8.8.8:53"]

[certificatesResolvers.letsencrypt_http.acme]
  email = "foo@email.com"
  storage = "/acme/acme-http.json"
  [certificatesResolvers.letsencrypt_http.acme.httpChallenge]
    entryPoint = "http"