Crosssite links indexed in Google after Installing SSL


#1

We

We installed Let’s Encrypt on our server - because we need to acces some links in a security mode. We did not intend to indexed our website in Google and we imagined that if there are no links to the https:// page, there will be no indexed https:// duplicate content.

On our servers we have more ip-s. We installed the SSL on one domain ( domain1 ). On the same IP there are also hosted domain2 and domain3.

After 9 days we found in Google indexed some strange pages that look like the original but we different link
– the original link: http:// domain1-name / link-domain1
– the indexed strange link : https:// domain2-name / link-domain1

after we took off the SSL from the domain, the strange link answer with 404, but when the SSL was on it permited to show the content from a different website, and Google indexed.

Can anyone figure it ( maybe it is obviously for you :frowning: ) why this strange think happend and what should we do to be able to use SSL but keep the sites normal. And how did Google made it to those links?

Thank you for trying to clarify X files for me .


#2

Google’s web crawling robot prefers HTTPS versions of sites, since it wishes to point end users to the secure web site if one exists. The robot will inspect sites it discovers (more about that below) to see if they appear largely identical to another site it already knows about, if the new site is HTTPS but the old identical site was not, it will begin replacing links to the old site with the new one. If you have http://example.com/ serving a blog say, and https://example.com/ just has a login-protected admin service for the blog, the robot will realise they’re different and not try to send people to the HTTPS site to visit the blog. But if you make the HTTPS site work for the blog, and then a week later change your mind, it will take some time for Google’s robot to notice that you broke that and put back the HTTP-only links.

If you really don’t want robots looking at a site, use the Robot Exclusion Protocol to tell them to stop. This is a voluntary protocol, but all major web searches obey it.

As to how Google found your HTTPS site: Any mention anywhere, even if “by accident” looks the same to the crawler. Getting a Let’s Encrypt certificate publishes the existence of the DNS names for which the certificate was issued. So it’s even possible that the crawler found it directly from your creating the certificate. But often it can be someone puts it in a public pastebin, or they send a link on a public chat board, really almost anything.

The reason the “wrong” domain name worked for a while is most likely a configuration error on your part, which would be specific to the web server you’re using.


#3

Thank you for your quick answer.

Everything you said was ok. But we are interested in understand what happend there.

We only install the certificate on domain1-name.ro and www. domain1-name . ro . So even published on Let’s Encrypt it is very strange how Google formed or found the links https:// domain2-name/ link-domain1 . ( the final link of the domain 1 worked perfectly on domain 2, but only with https:// . with http:// answered 404. )

The problem is that you can acces a file from domain1 ( with SSL ) through domain2 ( without SSL, but on the same IP on the server)

https: //domanin1.com/file.php(SSL ok) -> here is the file
https: //domanin2.com/file.php (no SSL) - 200 OK header - it can be accesed by here
http: //domanin2.com/file.php - 404 header – it can only be accessed by https://

Have anyone an similar experience or an explanation?

Agent Mulder on a bad day :slight_smile:


#4

This can happen when domain 2 has a HTTP vhost but not a HTTPS one. Requests over HTTPS for that domain end up going to the default vhost (domain 1 in this case).

This is also why the http-01 challenge can’t be done over HTTPS except when following a redirect, as otherwise whoever controls the default would be able to get certs for all the other domains that don’t already have one.


#5

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.