Issues with http-01 validation

Hello All,

I'm at a loss why my challenge validation is not working. I'm running on Oracle Gov Cloud and have port 80 open to the internet. I have tried certbot with the --nginx and standalone... When I do the --debug-challenges and leave it holding I can open the full path (http://.well-known/acme-challenge/vW_tHZm6VkoMiYaX3AbZ7wMawYjGDhMXWCKO7MOVtDs) on my cell phone which has no connection to my cloud network and it prompts me to download the challenge file.

I thought I recall there was new ACME protocols or something that also needed to be allowed since like last summer, but I don't really see anything of that type on the Oracle cloud security list. Also does anyone know if the letsencrypt IPs are not US based for challenges? It's possible gov clouds block non US based IPs before they even get to the tenancy.

Any help would be appreciated, this task should have taken me 20 minutes and I'm hours in banging my head against the wall.

Thanks,
Doug

1 Like

Welcome to the community @dlaplant

Can you show the error when trying Certbot? Is it the same for both --nginx and --standalone?

There were recent changes to CAA security options but if you don't have DNS CAA records it would not matter.

When trying --nginx, do you see any challenge requests in the nginx access log? You should see (probably) 3 for each request.

3 Likes

Is there perhaps a Palo Alto firewall involved?

2 Likes

Yeah the network we had the ACME protocol issue was behind a PA firewall. This one is native Oracle Gov Cloud, I'm not sure what actual FW is under the hood there.

palo alto assigned separate protocol on /.well-known/acme-challange , acme-protocol, so you need to allow that saparately

3 Likes

yeah same error either way. It resolves the IP and ends up at:

Fetching /.well-known/acme-challenge/fMbNq-mY7-hwnFPP0FbanIBR-AeVxacXArGGbnMN4YE: Timeout during connect (likely firewall problem)

Hint: The Certificate Authority failed to download the temporary challenge files created by Certbot. Ensure that the listed domains serve their content from the provided --webroot-path/-w and that files created there can be downloaded from the internet.

the nginx error and access logs do not get written to when running certbot.

I'd normally think, ok what did I screw up on the security list??? but I know the port is open because if I go to the URL on my phone it prompts to download. I did also test the domain at letsdebug.net and it gives the failed to connect errors. It is really acting like letsencrypt legitimately can't reach it, I just can't imagine why that is.

My best guess is it is doing a similar thing to the Palo Alto issue we've seen here.
If you care to provide the FQDN (or send it to me via PM), we can better test the URL and determine what may be causing this problem.

3 Likes

I'll send it via PM if I can figure out how :smiley:

3 Likes

Start by clicking on my flag.
Then:
image

3 Likes

I think I might to new, no message button

let me send you one - then just reply to it

2 Likes

If I have guessed your domain hosted on OCI correctly (IP 139.x.x.x), I can't connect to it from anywhere at all.

3 Likes

yeah its 139.87.x.x

hmmm, I've tried a few off-LAN networks and they each connect.

What about with a tool like this site uptime tester (link here)?

Try your "home" page and a URL formatted like the acme challenge

3 Likes

Yeah that's the one.

I tried to connect from fresh instances in Digital Ocean NYC and AWS EC2 us-east-1, and both of them time out on curl attempts :man_shrugging:. The error from Let's Encrypt seems accurate.

4 Likes

yeah the manage engine site can't connect either. Maybe I'll start with trying to figure out why it is letting my few WAN networks connect and go from there. Got to be something goofy I have configured wrong. Thanks for getting me pointed in the right direction and some good tools to validate the access.

2 Likes

Something is there...
And it can be connected to...
But it doesn't speak HTTP/HTML

This is a basic tcpdump from my IP:

listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
02:47:22.143689 IP [my.IP].53408 > [SERVER.IP].80: Flags [S], seq 2634383343, win 64240, options [mss 1460,sackOK,TS val 3761205060 ecr 0,nop,wscale 7], length 0
02:47:22.144801 IP [SERVER.IP].80 > [my.IP].53408: Flags [S.], seq 464623315, ack 2634383344, win 14480, options [mss 1460,sackOK,TS val 229134649 ecr 3761205060,nop,wscale 7], length 0
02:47:22.144848 IP [my.IP].53408 > [SERVER.IP].80: Flags [.], ack 1, win 502, options [nop,nop,TS val 3761205061 ecr 229134649], length 0
02:47:22.144927 IP [my.IP].53408 > [SERVER.IP].80: Flags [P.], seq 1:210, ack 1, win 502, options [nop,nop,TS val 3761205062 ecr 229134649], length 209: HTTP: GET /.well-known/acme-challenge/Test_File-1234 HTTP/1.1
02:47:22.146288 IP [SERVER.IP].80 > [my.IP].53408: Flags [.], ack 210, win 122, options [nop,nop,TS val 229134649 ecr 3761205062], length 0
02:47:32.213850 IP [SERVER.IP].80 > [my.IP].53408: Flags [R.], seq 1, ack 210, win 122, options [nop,nop,TS val 229135656 ecr 3761205062], length 0

From which we can see replies on port 80.
But none on port 443:

02:50:47.703228 IP [my.IP].52952 > [SERVER.IP].443: Flags [S], seq 2900942312, win 64240, options [mss 1460,sackOK,TS val 3761410620 ecr 0,nop,wscale 7], length 0
02:50:48.732954 IP [my.IP].52952 > [SERVER.IP].443: Flags [S], seq 2900942312, win 64240, options [mss 1460,sackOK,TS val 3761411650 ecr 0,nop,wscale 7], length 0
02:50:50.748951 IP [my.IP].52952 > [SERVER.IP].443: Flags [S], seq 2900942312, win 64240, options [mss 1460,sackOK,TS val 3761413666 ecr 0,nop,wscale 7], length 0
02:50:55.004946 IP [my.IP].52952 > [SERVER.IP].443: Flags [S], seq 2900942312, win 64240, options [mss 1460,sackOK,TS val 3761417922 ecr 0,nop,wscale 7], length 0

[as expected with a truly drop firewall rule]

Maybe some :bird: :eyes: can spot something from those...
[hawk eyes]

2 Likes

Ok so the issue was in the return routing for the subnet in Oracle Cloud. Basically, for internet traffic it was trying to route back over a NAT gateway instead of an internet gateway, so connections could come in but the response from those requests could not return. The reason my test networks worked is I must have run into this in the past and those routes were added individually at some point....

I really appreciate everyone's help here!! I'm sure without it I'd still be just banging by head against the wall believing it was "open" on the internet.

3 Likes