Repeated timeout errors when attempting to issue certificate


#1

My domain is:
michaelmarley.com, matthewtmarley.com

I ran this command:
sudo certbot certonly --webroot -w /var/www/ -d michaelmarley.com -d www.michaelmarley.com -w /var/www-matthew/ -d matthewtmarley.com -d www.matthewtmarley.com --csr all.rsa.csr

It produced this output:

My web server is (include version):
nginx 1.13.10

The operating system my web server runs on is (include version):
Ubuntu Server 18.04

My hosting provider, if applicable, is:
A box in my closet

I can login to a root shell on my machine (yes or no, or I don’t know):
Yes

I’m using a control panel to manage my site (no, or provide the name and version of the control panel):
No

I am having repeated “Timeout” issues when trying to get a certificate issued. As can be seen from the command above, I am trying to issue a certificate for the root domain and the www. subdomain of two domains with an existing CSR and key. At first, all the domains were giving the timeout error. I tried multiple times as I was messing around with firewall and DNS settings and eventually some of the domains started working, though I had not actually made any configuration changes. The last one (“www.matthewtmarley.com”) is still timing out in the production environment, though I was eventually able to get a certificate issued in the staging environment. Now when I rerun the command, I cannot see any requests arriving in nginx at all. I suspect that since the authorization requests come from multiple sources, one of the sources is getting blocked somewhere upstream of me, but all I get is the useless “Timeout” message, so there isn’t anything else I can think of to troubleshoot.


#2

Hi,

You got two cname on your server. (For www hostnames) since you are using your www cname to root then to other domain, why not just cname from www to the other domain?

Thank you


#3

I changed all the CNAMEs to point directly to the root domain name, but I don’t expect that to have any effect. The DNS server was already smart enough to follow the CNAMEs and return the IP address directly to the first request, so it doesn’t really have any effect.


#4

It smells like a DNS problem.

My reasoning is that I pointed a domain at your IPv6 address and tried doing an authorization against it.

What happened is that it succeeded on the port 80 request (using my domain), and then timed out on the subsequent redirect to the same server on port 443, but using your domain.

You can see the details here: https://acme-staging-v02.api.letsencrypt.org/acme/authz/FtIYZFIB7u1OGww03_U5ioKqyyXR_qYQy3EEGAW3S6w

Now, when we do an authorization using just your domain, it fails immediately on the port 80 request, even though it used the exact same IPv6 address as in the former authorization attempt.

Details for that here: https://acme-staging-v02.api.letsencrypt.org/acme/challenge/ACiafKsQnYPfpiHIogDLwOQA7HMXdq_kgCw6i1xBsIs/111568295

A third explanation could be that the server is that the connectivity is randomly failing (and the DNS angle is just a red herring).

I think the Boulder logs for this should solve this if we can’t find a reason, but that’ll need a staff member to take a look.


#5

But both of those results show that the (correct) IP address(es) was/were resolved, so I don’t understand how DNS could be causing it.


#6

This morning it seems to have started working properly. Maybe it was random server connectivity issues. I will keep an eye on it and close this if it does keep working.


#7

It does seem to be working consistently now, but when I issue a certificate, I am no longer seeing any requests coming in in the nginx access log. This concerns me because I am afraid that a previous successful result has been cached somewhere on LE’s side and once that cache expires, it will go back to failing usually as it did before. However, I don’t think there is anything I can do to test this on my side. Does anyone know how I might go about testing this or know for sure if there were server issues that may have caused the original problem on March 24?


#8

I tried testing by adding extra temporary CNAMEs for which I could request certificates. I once again get the repeated timeouts with the new CNAMEs, but this time I was more careful about collecting the logs. Here’s what I see, grouped by response code (301s first, then 200s), then requesting IP address, and finally sorted by requested URL:

2600:1f16:185:3210:fa10:3caa:9df7:9ce9 - - [25/Mar/2018:13:04:50 -0400] "GET /.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ HTTP/1.1" 301 186 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2600:1f16:185:3210:fa10:3caa:9df7:9ce9 - - [25/Mar/2018:13:04:50 -0400] "GET /.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 HTTP/1.1" 301 186 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2600:1f14:ac6:4f10:505a:1249:9e33:edae - - [25/Mar/2018:13:04:50 -0400] "GET /.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ HTTP/1.1" 301 186 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2600:1f14:ac6:4f10:505a:1249:9e33:edae - - [25/Mar/2018:13:04:50 -0400] "GET /.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 HTTP/1.1" 301 186 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2a05:d014:fbe:3d10:162f:8fce:fc0a:a96c - - [25/Mar/2018:13:04:51 -0400] "GET /.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ HTTP/1.1" 301 186 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2a05:d014:fbe:3d10:162f:8fce:fc0a:a96c - - [25/Mar/2018:13:04:50 -0400] "GET /.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 HTTP/1.1" 301 186 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2600:3000:2710:300::1d                 - - [25/Mar/2018:13:04:50 -0400] "GET /.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 HTTP/1.1" 301 186 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

2600:1f16:185:3210:fa10:3caa:9df7:9ce9 - - [25/Mar/2018:13:04:51 -0400] "GET /.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ HTTP/1.1" 200 87 "http://test.michaelmarley.com/.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2600:1f16:185:3210:fa10:3caa:9df7:9ce9 - - [25/Mar/2018:13:04:50 -0400] "GET /.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 HTTP/1.1" 200 87 "http://test.matthewtmarley.com/.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2600:1f14:ac6:4f10:505a:1249:9e33:edae - - [25/Mar/2018:13:04:51 -0400] "GET /.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ HTTP/1.1" 200 87 "http://test.michaelmarley.com/.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2600:1f14:ac6:4f10:505a:1249:9e33:edae - - [25/Mar/2018:13:04:51 -0400] "GET /.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 HTTP/1.1" 200 87 "http://test.matthewtmarley.com/.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2a05:d014:fbe:3d10:162f:8fce:fc0a:a96c - - [25/Mar/2018:13:04:51 -0400] "GET /.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ HTTP/1.1" 200 87 "http://test.michaelmarley.com/.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2a05:d014:fbe:3d10:162f:8fce:fc0a:a96c - - [25/Mar/2018:13:04:51 -0400] "GET /.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 HTTP/1.1" 200 87 "http://test.matthewtmarley.com/.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

It appears that the server at 2600:3000:2710:300::1d has connectivity issues to my server, as you can see by the fact that there is only one 301 response (there should have been two, one for each domain) and no 200 responses to that IP. That also makes sense with the error message I received, which said that one domain timed out on http://test.michaelmarley.com/.well-known/acme-challenge/COjR7vfg5IKWd2mozoYzZhdbogIUwzyPo_I93cBFslQ (the unredirected URL) and the other timed out on https://matthewtmarley.com/.well-known/acme-challenge/bdjJSHuvUrPHHEUpeOKlwvVb_t8qakMXpYBSbXe9798 (the URL from the single 301 redirect sent to 2600:3000:2710:300::1d above. I attempted to further diagnose this issue using ping and traceroute, but was unable to make any progress since apparently none of the servers respond to pings.

TL;DR: The server at 2600:3000:2710:300::1d seems to have connectivity issues to my server (at 2606:a000:4447:9802:baae:edff:fe73:314a). Can someone please investigate this? Thanks!


#9

It provides less than half of the complete picture, but an mtr or traceroute6 to the Let’s Encrypt IP should get almost all the way to the server. I don’t know at what point they start getting filtered, but it should be informative enough.


#10

Here’s the output for traceroute6 on letsencrypt.org:

michael@michaelmarley:~$ traceroute6 letsencrypt.org
traceroute to letsencrypt.org (2600:1408:9000:1ba::ce0) from 2606:a000:4447:9802:baae:edff:fe73:314a, port 33434, from port 40513, 30 hops max, 60 bytes packets
 1  cpe-2606-A000-4447-9802-0-0-0-1.dyn6.twc.com (2606:a000:4447:9802::1)  0.229 ms  0.209 ms  0.274 ms 
 2  * * *         
 3  cpe-2606-A000-0-4-0-0-8-354.dyn6.twc.com (2606:a000:0:4::8:354)  13.771 ms  15.286 ms  17.477 ms 
 4  cpe-2606-A000-0-4-0-0-2-56.dyn6.twc.com (2606:a000:0:4::2:56)  20.284 ms  16.387 ms  16.364 ms 
 5  cpe-2606-A000-0-4-0-0-0-4E.dyn6.twc.com (2606:a000:0:4::4e)  23.407 ms  22.490 ms  17.161 ms 
 6  2001:1998:0:8::14 (2001:1998:0:8::14)  28.598 ms  23.052 ms  16.369 ms 
 7  * * *         
 8  g2600-1408-9000-0000-0000-0000-172d-b58c.deploy.static.akamaitechnologies.com (2600:1408:9000::172d:b58c)  19.386 ms  24.622 ms  21.316 ms

I also pinged that same IP for several minutes and got no packet loss, though I’m not sure if this s an accurate test because this IP isn’t even in the same prefix.


#11

Yeah, the CDN frontends are a different service on a different ISP.


#12

I decided to do a traceroute6 on 2600:3000:2710:300::1d too. It obviously didn’t get all the way since the 2600:3000:2710:300::1d doesn’t respond to pings, but here’s what I got:

michael@michaelmarley:~$ traceroute6 2600:3000:2710:300::1d
traceroute to 2600:3000:2710:300::1d (2600:3000:2710:300::1d) from 2606:a000:4447:9802:baae:edff:fe73:314a, port 33434, from port 38975, 30 hops max, 60 bytes packets
 1  cpe-2606-A000-4447-9802-0-0-0-1.dyn6.twc.com (2606:a000:4447:9802::1)  0.280 ms  0.254 ms  0.315 ms 
 2  * * *         
 3  cpe-2606-A000-0-4-0-0-8-356.dyn6.twc.com (2606:a000:0:4::8:356)  20.805 ms  17.070 ms  13.203 ms 
 4  cpe-2606-A000-0-4-0-0-2-58.dyn6.twc.com (2606:a000:0:4::2:58)  15.015 ms  12.721 ms  13.404 ms 
 5  cpe-2606-A000-0-4-0-0-0-52.dyn6.twc.com (2606:a000:0:4::52)  23.928 ms  22.854 ms  26.523 ms 
 6  2001:1998:0:8::16 (2001:1998:0:8::16)  26.938 ms  30.790 ms  22.897 ms 
 7  2001:1998::66:109:6:171 (2001:1998::66:109:6:171)  24.319 ms  18.901 ms  21.583 ms 
 8  * * 2001:1998:0:8::1ba (2001:1998:0:8::1ba)  4990.888 ms 
 9  lo-0-v6.ear2.Denver1.Level3.net (2001:1900::3:19e)  55.269 ms  53.537 ms  55.176 ms 
10  VIAWEST-INT.edge3.Denver1.Level3.net (2001:1900:2100::373a)  53.161 ms  55.108 ms  58.079 ms 
11  2600:3000:2:330::1 (2600:3000:2:330::1)  55.544 ms  57.736 ms  49.415 ms 
12  2600:3000:0:2::85 (2600:3000:0:2::85)  58.002 ms  55.659 ms  51.295 ms 
13  * * *         
14  2600:3000:3:38::2 (2600:3000:3:38::2)  73.983 ms  76.484 ms  79.562 ms 
15  * * *         
16  2600:3000:2700:1073::4 (2600:3000:2700:1073::4)  77.562 ms  77.829 ms  75.403 ms 
17  * * *         
18  * * *         
19  * * *         
20  * * *         
21  * * *         
22  * * *         
23  * * *         
24  * * *         
25  * * *         
26  * * *         
27  * * *         
28  * * *         
29  * * *         
30  * * *

There also doesn’t seem to be any packet loss while pinging 2600:3000:2700:1073::4.


#13

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.