Timeout during connect (likely firewall problem)

Certificate renewal has worked perfectly until recently (2019-07-18), but fails now with the timeout during connect. In the webserver logs I can see the requests just fine. Other domains certs on same webserver have been renewed successfully on 2019-08-25.

2600:1f14:a8b:501:68db:425b:e14d:d96c - - [04/Sep/2019:22:53:20 +0200] "GET /.well-known/acme-challenge/_eBXhqeoBkcay96ixMknapcrBA15-ggq5c5CJdHc-ww HTTP/1.1" 301 298
2600:3000:2710:300::1e - - [04/Sep/2019:22:53:20 +0200] "GET /.well-known/acme-challenge/_eBXhqeoBkcay96ixMknapcrBA15-ggq5c5CJdHc-ww HTTP/1.1" 301 298
2600:3000:2710:300::1e - - [04/Sep/2019:22:53:20 +0200] "GET /.well-known/acme-challenge/0DYGUICtcVOhwra7WUNM1pW0cV0LRxXy430U3p28ck8 HTTP/1.1" 301 298
2600:1f14:a8b:501:68db:425b:e14d:d96c - - [04/Sep/2019:22:53:20 +0200] "GET /.well-known/acme-challenge/0DYGUICtcVOhwra7WUNM1pW0cV0LRxXy430U3p28ck8 HTTP/1.1" 301 298
2600:1f14:a8b:501:68db:425b:e14d:d96c - - [04/Sep/2019:22:53:20 +0200] "GET /.well-known/acme-challenge/_eBXhqeoBkcay96ixMknapcrBA15-ggq5c5CJdHc-ww HTTP/1.1" 200 87
2600:3000:2710:300::1e - - [04/Sep/2019:22:53:20 +0200] "GET /.well-known/acme-challenge/_eBXhqeoBkcay96ixMknapcrBA15-ggq5c5CJdHc-ww HTTP/1.1" 200 87
2600:3000:2710:300::1e - - [04/Sep/2019:22:53:21 +0200] "GET /.well-known/acme-challenge/0DYGUICtcVOhwra7WUNM1pW0cV0LRxXy430U3p28ck8 HTTP/1.1" 200 87
2600:1f14:a8b:501:68db:425b:e14d:d96c - - [04/Sep/2019:22:53:21 +0200] "GET /.well-known/acme-challenge/0DYGUICtcVOhwra7WUNM1pW0cV0LRxXy430U3p28ck8 HTTP/1.1" 200 87
3.122.105.36 - - [04/Sep/2019:22:53:30 +0200] "GET /.well-known/acme-challenge/_eBXhqeoBkcay96ixMknapcrBA15-ggq5c5CJdHc-ww HTTP/1.1" 301 298
18.224.20.83 - - [04/Sep/2019:22:53:30 +0200] "GET /.well-known/acme-challenge/_eBXhqeoBkcay96ixMknapcrBA15-ggq5c5CJdHc-ww HTTP/1.1" 301 298
3.122.105.36 - - [04/Sep/2019:22:53:30 +0200] "GET /.well-known/acme-challenge/0DYGUICtcVOhwra7WUNM1pW0cV0LRxXy430U3p28ck8 HTTP/1.1" 301 298
18.224.20.83 - - [04/Sep/2019:22:53:30 +0200] "GET /.well-known/acme-challenge/0DYGUICtcVOhwra7WUNM1pW0cV0LRxXy430U3p28ck8 HTTP/1.1" 301 298

I am a bit stuck on how to troubleshoot this - any pointers or ideas would be most welcome!

Best regards,
Anders

Please fill out the fields below so we can help you better. Note: you must provide your domain name to get help. Domain names for issued certificates are all made public in Certificate Transparency logs (e.g. https://crt.sh/?q=example.com), so withholding your domain name here does not increase secrecy, but only makes it harder for us to provide help.

My domain is: bystrup.net

I ran this command: certbot --dry-run renew

It produced this output:


Processing /etc/letsencrypt/renewal/bystrup.net.conf


Cert not due for renewal, but simulating renewal for dry run
Plugins selected: Authenticator webroot, Installer None
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for bystrup.net
http-01 challenge for www.bystrup.net
Waiting for verification…
Cleaning up challenges
Attempting to renew cert (bystrup.net) from /etc/letsencrypt/renewal/bystrup.net.conf produced an unexpected error: Failed authorization procedure. bystrup.net (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching https://bystrup.net/.well-known/acme-challenge/_eBXhqeoBkcay96ixMknapcrBA15-ggq5c5CJdHc-ww: Timeout during connect (likely firewall problem), www.bystrup.net (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching https://bystrup.net/.well-known/acme-challenge/0DYGUICtcVOhwra7WUNM1pW0cV0LRxXy430U3p28ck8: Timeout during connect (likely firewall problem). Skipping.

My web server is (include version): Apache 2.4.25

The operating system my web server runs on is (include version): Debian Stretch

My hosting provider, if applicable, is:

I can login to a root shell on my machine (yes or no, or I don’t know): yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel): no

The version of my client is (e.g. output of certbot --version or certbot-auto --version if you’re using Certbot): 0.28.0

It’s suspicious that some of the requests used IPv4 – they would have used IPv6 unless IPv6 failed.

There should have been, um, 16 requests:

  • Four locations
  • Two names
  • Doubled because your HTTP redirects to HTTPS

But your log includes only 12 requests.

For the HTTP requests, AWS us-east-2 and eu-west-1 both fell back to IPv4.

And then for the HTTPS requests, they both timed out.

Could there be routing issues? Geo blocking? Some IPs or netblocks blocked in your firewall?

Routing issues: I don’t know. I’ve successfully tried curl’ing /.well-known via IP4 and IP6 from other hosts I control.

Geo blocking: Not that I know of, and at least nothing I’m controlling.

IPs or netblocks blocked: No.

Is the timeout message due to the missing 4 requests?

BR,
Anders

Yes.


I don’t know about eu-west-1, but I cannot access your IPv6 IP over HTTP or HTTPS – or with ping6 – from AWS us-east-2. It times out.

I can access the IPv4 IP over HTTP and HTTPS, though. (But not with IPv4 ping. Looks like you have ping blocked.)

If I try traceroute6 from rush.mattnordhoff.net, it gets this far:

$ date && traceroute6 bystrup.net
Wed Sep  4 21:37:49 UTC 2019
traceroute to bystrup.net (2a00:7660:8ae:0:20c:29ff:feec:1eeb) from 2600:1f16:ec6:ec6c:f602:da4:b6c9:584a, 30 hops max, 24 byte packets
 1  2620:107:4000:4702:8000:0:6440:1380 (2620:107:4000:4702:8000:0:6440:1380)  1.192 ms  2.905 ms  10.637 ms
 2  2620:107:4000:4702::6440:c47c (2620:107:4000:4702::6440:c47c)  2.106 ms  5.517 ms  10.504 ms
 3  2620:107:4000:4705:8000:0:6442:32c (2620:107:4000:4705:8000:0:6442:32c)  1.181 ms  5.315 ms  10.975 ms
 4  2620:107:4000:4705:8000:0:6442:72d (2620:107:4000:4705:8000:0:6442:72d)  10.782 ms  4.032 ms  10.93 ms
 5  2620:107:4000:4705:8000:0:6442:469 (2620:107:4000:4705:8000:0:6442:469)  3.247 ms  2.495 ms  5.064 ms
 6  2620:107:4000:4705:8000:0:6441:9e1 (2620:107:4000:4705:8000:0:6441:9e1)  0.314 ms  0.449 ms  0.323 ms
 7  2620:107:4000:9::2b (2620:107:4000:9::2b)  0.772 ms  0.757 ms  0.782 ms
 8  2620:107:4000:9::35 (2620:107:4000:9::35)  8.691 ms  10.411 ms  10.289 ms
 9  2620:107:4000:9::2e (2620:107:4000:9::2e)  8.273 ms  8.217 ms  8.26 ms
10  * * *
11  * * *
12  2620:107:4000:ff::59 (2620:107:4000:ff::59)  9.076 ms  8.221 ms  8.243 ms
13  * * *
14  2620:107:4000:ff::5b (2620:107:4000:ff::5b)  8.266 ms  8.34 ms  8.287 ms
15  10gigabitethernet4-1.core1.chi1.he.net (2001:504:0:4::6939:1)  8.084 ms  8.129 ms  8.236 ms
16  100ge16-1.core1.nyc4.he.net (2001:470:0:298::2)  20.412 ms  20.415 ms  20.369 ms
17  100ge16-2.core1.lon2.he.net (2001:470:0:2cf::1)  87.241 ms  87.287 ms  87.753 ms
18  * * *
19  10ge8-1.core1.ham1.he.net (2001:470:0:30d::2)  110.094 ms  100.992 ms  101.141 ms
20  10ge1-7.core1.cph1.he.net (2001:470:0:30e::1)  107.273 ms  124.957 ms  108.266 ms
21  2001:470:1:657::2 (2001:470:1:657::2)  108.934 ms  108.815 ms  108.809 ms
22  customer-2a00-7660-0100-0036-0000-0000-0000-0001.ip6.gigabit.dk (2a00:7660:100:36::1)  113.003 ms  110.313 ms  109.465 ms
23  * * *

In near-desperation I changed the Redirect to HTTPS to exclude /.well-known and then the dry run succeeds. I’ll need to figure out why the redirect’s causes timeouts with LE.

Regarding your first answer, I’m seeing two IPv4 and two IPv6 requests from LE / AWS (2600:3000:2710:300::1d, 2600:3000:2710:300::1e, 2600:1f14:a8b:501:68db:425b:e14d:d96c, 18.224.20.83, 3.122.105.36) per FQDN, not all IPv6 as you implied? Should I worry about that too?

Anyways, thank you very much for taking the time to help - it’s much appreciated!

BR,
Anders

It’s still concerning. The other two validation servers are trying to use IPv6 for the HTTP request, it’s failing, and they’re successfully falling back to IPv4.

It’s still not good that IPv6 is failing.

And if Let’s Encrypt’s fallback logic changes in the future, it might stop covering for the issue.

1 Like

The plot thickens: On AWS’ IPv6 reachability page (http://ipv6.ec2-reachability.amazonaws.com/) I see a number of errors:
image

And using RIPEs Atlas tool it’s clear that a least five other probes from the same AS as me have similar reachability issues against US West Oregon ::ec2 - but not “the other US West Oregon (::36c8:1)”

That’s kind of weird, but I guess I’ll have to get in touch with my ISP to further troubleshoot this one.

/Anders/

1 Like

FWIW, going by the IP addresses, I think the first “:ec2” IP in each region is an EC2 instance or something, and the second IP is an ELB.

No idea why there would be routing differences, though…

Alas, it turns out that the root cause is a hardware fault in the ISPs core router - they’re waiting for Juniper to fix the issue… I hope they manage before LetsEncrypt changes the fallback logic :slight_smile:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.