Googlebot and SNI issues?

I’m a customer of a site that uses an old-fashioned multiuser system model. They have ~1800 shell account users, and a number of them have basic websites. These basic websites can have custom domains, each of which is mapped with and without www.

For example, I have an Apache userdir with a toy of mine:

  http://www.panix.com/~eli/anagramhunt/

It’s also available with my vanity domain:

  http://qaz.wtf/anagramhunt/

And the www version of that:

  http://www.qaz.wtf/anagramhunt/

(Some of my stuff will react to HTTP_HOST to change content or do redirects, but that old project does not.)

I would like to have https for my site there, and have asked the staff about Let’s Encrypt. Apparently they gave it a brief whirl, but ran into some issues that are not Let’s Encrypt’s fault.

Because there are so many hostnames that are being served from the same Apache, SAN is unworkable and SNI is required. BUT it seems that Googlebot is broken with respect to SNI and the brief experiment caused a lot of Google damage.

Here are some discussion links about the Googlebot issue:

https://productforums.google.com/forum/#!topic/webmasters/yWR2na-piqw;context-place=forum/webmasters

https://productforums.google.com/forum/#!topic/webmasters/G1Ea8-Qkgs8;context-place=topicsearchin/webmasters/SNI

I didn’t see anything here about Google SNI issues. Has anyone been putting pressure on Google to fix this on their end?

The discussions you linked are for two different issues.

The first one claims that Googlebot is essentially indexing search results under the wrong domain when SNI is used. This is highly unlikely - as an example, every site that uses Cloudflare (with the exception of some of their more pricey paid plans) requires SNI, and I’m not aware of any reports of this nature. We’re talking about millions of domains here, so someone would’ve surely noticed this by now. I’m personally running quite a number of domains on Cloudflare, and none of them show this behaviour. My best guess is that these sites were either indexed years ago, when Google might have actually had SNI issues, or this was simply a server misconfiguration and they were actually serving the same content on both domains. This would be not an issue related to HTTPS at all - if you’re serving the same content on two domains served over HTTP, the result would be the same. If you want to make sure you only ever see your vanity domain in the search results for your website content, you’ll need to implement redirects for all other domains - this holds true for both HTTP and HTTPS.

The second discussion is regarding a notification that Google sends to webmasters when they detect that your site requires SNI. The message itself states “This means that your website is not perceived as secure by some browsers.” - emphasis on some browsers. If you care about Internet Explorer on Windows XP, for example, you might want to make sure your site works without SNI, but otherwise, you can safely ignore this warning. Wikipedia has more details on SNI support.

3 Likes

The first link (…/yWR2na-piqw) is the one I got when I asked what the problem was, so it is what Panix thinks the problem is. I understand the http/https stack very well, but I don’t know specifics of how either Panix or Cloudflare configures their systems. I do know know generalities about how Panix is configured:

There are two squid servers at squid1 and squid2 (.nyc.access.net) that reverse proxy various internal web servers (at least two, possibly more for the higher paying service rates). The internal web servers are all Apache. On my “cheap-web” tier, there are userdirs which always show up under the panix.com domain, and virtual hosts for various customers with small Apache configs like:

<VirtualHost *:80>
    ServerName vanity.example.com
    ServerAlias www.vanity.example.com
    
    DocumentRoot /htdocs/userdirs/vanity
</VirtualHost *:80>

(IE, doing hardly anything, but well-separated per user.) I don’t know what the config changes for SSL looked like, and I don’t know what the squid configs look like.

So, while it is entirely possible that Cloudflare has done something that gets SNI working, posts like the one linked above, and the experience I’ve had described to me, do not make it seem like this is trivial as your response implies.

This isn’t related to SNI, or even HTTPS in general.

If they’re serving the content that’s available on your vanity domain (qaz.wtf) on a path under panix.com, and that path isn’t blocked from Googlebot in some way (i.e. robots.txt), and it doesn’t redirect back to your vanity domain, then Google might index it if it finds a reference to it, and search results might include panix.com.

If they don’t do that - in other words, if they either block your path under panix.com using robots.txt, or if there’s a redirect to your vanity domain in place, then there should be no search results for panix.com.

You can reduce all of this to HTTP. If things currently work under HTTP, and the described behaviour will remain the same under HTTPS, then you will not introduce any issues. Things will continue to work the way they currently do.

If the behaviour changes under HTTPS, that’s an issue with their server configuration, but not one that’s related to SNI.

All of the sites currently work under http and apparently are not having this problem. My vanity site takes care of redirects, if visited under the “wrong” hostname (for important subdirectories) but that’s my personal configuration via .htaccess and CGI.

But it seems like I cannot adequately describe the problem, lacking the details in configuration, to know where the mistake is.

Generally speaking, if your site serves a redirect on the shared domain, and does so under both HTTP and HTTPS, you should not run into the issue described in the first discussion linked from your OP.

Some sites have reported a short, temporary drop in site visitors coming from Google when they first migrated to HTTPS (and started serving redirects to HTTPS from HTTP). Here’s an example from Wired. Is it possible that this is what’s happening? There’s not a lot of insight in how these things work on Google’s end, but the general consensus seems to be that the numbers should be back at the original level after about a month. If you have a way to perhaps only test this for some subset of your sites, I would encourage you to try this for a month or two and see if the numbers get better.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.