Need fallback to IPv4 when host is unreachable over IPv6

Renewal of LetsEncrypt certificates on a number of dual-stacked hosts which I manage has, until recently, just worked.

Within the last week or ten days, there has been a routing problem in my hosting
provider’s network, so that these hosts are no longer reachable over IPv6. Perhaps due to obstacles arising from the COVID-19 pandemic, my hosting provider has not yet been able to resolve this problem.

Certificates fell due for automatic renewal on one of these hosts two days ago. Renewal failed, and certbot showed a message that the LetsEncrypt server was not receiving a response from the host.

Manual retry also failed. Subsequent retry after withdrawing the relevant AAAA records from the DNS succeeded.

I anticipate a ripple effect as other hosts in turn need certificates renewed.

It would be helpful if, on failure to reach the host over IPv6, the LetsEncrypt servers would automatically fall back to trying IPv4. If this feature has already been implemented, it seems that it is not always working correctly (or not uniformly deployed on all LetsEncrypt server instances), and needs some debugging; in this case, I shall be happy to help as may be useful.

There is a fallback to IPv4. It applies as long as:

1. The network operation that failed was the dialing of the socket. Anything bad that happens after a TCP socket is successfully opened to an IPv6 address will not trigger a fallback.

2. The request is not the result of an HTTP redirect. If the initial request to the acme-challenge resource results in a redirect by the server, any subsequent requests will not be protected by IPv4 fallback. It’s a roll of the dice at that point as to what address the validation server uses.

Outside of these two exceptions, the IPv4 fallback should be reliable. If you know of some other way that it’s not working, could you post an order URL from such an instance?

1 Like

Thank you for such a prompt reply. I shall have to re-introduce AAAA records and force a renewal before posting as you request.

1 Like

I hope that exception 2 (HTTP redirect not protected by fallback) is a temporary expedient.

Redirecting from HTTP to HTTPS is a fairly widespread practice, as is the use of a number of aliased server names covered by the same certificate. Imposing a “roll of the dice” when only redirection is involved reduces the chance of success to 50% for a dual-homed host whose IPv6 connectivity is accidentally unavailable. Each alias added to the mix reduces the residual chance of success by a further half.

For example, a mail server in this situation with a canonical name vps.example.com and aliases {mail,imap,smtp}.example.com seems to have a 6.25% chance of successful validation.

I hope that the implementation of the fallback mechanism will be reviewed in view of this brittleness. If the general case is too difficult, I think it should be easy to accommodate the cases where only a change of scheme is involved, or (somewhat more generally) when the host name is not changed by the redirection.

1 Like

Last time this issue was discussed, one of the Boulder developers remarked:

Doing otherwise with our existing HTTP-01 validation code and the Go HTTP stdlib isn't trivial. Overall we view this happy eyeballs behaviour as a convenience since the root cause (an AAAA record with incorrect configuration for the HTTP-01 challenge) can be addressed by end users where it will fix brokenness above and beyond the issuance process.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.