Certbot is frequently timing out since a few weeks ago

We could probably fix this by changing our network so the VA is not NATed. Right now I think we probably shouldn't put in the work, since what I've been reading suggests that tcp_tw_recycle is an anti-pattern, and we shouldn't make special efforts to accommodate it. However, I'm open to changing my mind, especially if we find that a lot of people have this problem.

BTW, I read more of that article, and it explains why you found that the TCP Timestamps correlated with errors:

Linux will drop any segment from the remote host whose timestamp is not strictly bigger than the latest recorded timestamp, unless the TIME-WAIT state would have expired
When the remote host is in fact a NAT device , the condition on timestamps will forbid all the hosts except one behind the NAT device to connect during one minute because they do not share the same timestamp clock.

Starting from Linux 4.10 (commit 95a22caee396), Linux will randomize timestamp offsets for each connection, making this option completely broken, with or without NAT. It has been completely removed from Linux 4.12.

3 Likes