Last modified: 2013-01-14 17:57:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30242, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 28242 - Pages should not be indexed by search engines through interwiki links from other wikis
Pages should not be indexed by search engines through interwiki links from ot...
Status: RESOLVED DUPLICATE of bug 26115
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.google.com.br/search?q=%22...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-26 00:27 UTC by Helder
Modified: 2013-01-14 17:57 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Helder 2011-03-26 00:27:36 UTC
I've noticed some pages have being indexed by Google as if they were from English Wikipedia. The url above shows the result of a query like
 "B:Pt:Página_principal"
which has the following url as the only result:
 http://en.wikipedia.org/wiki/B:Pt:Página_principal

For another example, see the result for "Teoria de números/Números primos":
http://www.google.com.br/search?q=%22Teoria+de+n%C3%BAmeros%2FN%C3%BAmeros+primos%22

This shouldn't happen.
Comment 1 Mark A. Hershberger 2011-03-26 18:46:52 UTC
(In reply to comment #0)
> This shouldn't happen.

And we can't fix Google.

This may be due to the sort of redirect that is in place there.  But when I tried to investigate, I got a 403 Forbidden error from http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal, but I think that was because of the User Agent.
Comment 2 Mark A. Hershberger 2011-03-26 20:05:17 UTC
See [[meta:nofollow]]
Comment 3 Bawolff (Brian Wolff) 2011-03-26 20:08:59 UTC
We send 302 Moved Temporarily status codes (Should they be moved permenantly?):

bawolff@Bawolff-L:/var/www/w/$ HEAD -S -H 'User-agent: test' \
http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal

HEAD http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal --> 302 Moved Temporarily
HEAD http://en.wikibooks.org/wiki/Pt:P%C3%A1gina_principal --> 302 Moved Temporarily
HEAD http://pt.wikibooks.org/wiki/P%C3%A1gina_principal --> 200 OK
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Connection: close
Date: Sat, 26 Mar 2011 18:07:16 GMT
Age: 7188
Server: Apache
Vary: Accept-Encoding,Cookie
Content-Language: pt
Content-Length: 73670
Content-Type: text/html; charset=UTF-8
Last-Modified: Thu, 03 Mar 2011 19:23:27 GMT
Client-Date: Sat, 26 Mar 2011 20:07:05 GMT
Client-Peer: 208.80.152.2:80
Client-Response-Num: 1
X-Cache: HIT from sq40.wikimedia.org
X-Cache: MISS from sq36.wikimedia.org
X-Cache-Lookup: HIT from sq40.wikimedia.org:3128
X-Cache-Lookup: MISS from sq36.wikimedia.org:80
Comment 4 MZMcBride 2011-03-26 20:13:26 UTC
Please always use sentence case when changing bug summaries (initial capital letter, lowercase everything else, except words that are always capitalized like proper nouns and variables, no trailing punctuation).

In any case, this seems like a duplicate of bug 8753 if we're to believe the new bug summary ("interwiki links should have the nofollow attribute"). However, it's unclear whether this new bug summary is accurate. The old bug summary and the opening comment are about a particular symptom ("Pages should not be indexed through interwiki links from other wikis") while the updated bug summary is about a specific solution. This discrepancy needs to be addressed.
Comment 5 Mark A. Hershberger 2011-03-26 20:21:08 UTC
(In reply to comment #4)
> However, it's unclear whether this new bug summary is accurate. The old bug
> summary and the opening comment are about a particular symptom ("Pages should
> not be indexed through interwiki links from other wikis") while the updated bug
> summary is about a specific solution. This discrepancy needs to be addressed.

Specific solutions to particular symptoms are appropriate since we can only implement specific solutions.

Putting the proposed solution in the summary is appropriate, since it focuses the bug.  If the solution is implemented, the bug can be closed if it addresses the particular symptoms.

If, at some later time, people are dis-satisfied with the solution, a new bug should be opened with their specific issues.
Comment 6 MZMcBride 2011-03-26 20:25:57 UTC
(In reply to comment #5)
> If, at some later time, people are dis-satisfied with the solution, a new bug
> should be opened with their specific issues.

That doesn't seem particularly fair to the person who took the time to file a bug about their specific problem. You're changing the nature of their request and then telling them that if they don't like how the new request is implemented, they can file another bug? That seems completely backward and wrong.

If you think the issue of interwiki links not having the "nofollow" attribute needs attention, reopen bug 8753. I'm mostly reverting the bug summary here for now.
Comment 7 Bawolff (Brian Wolff) 2011-03-26 21:13:52 UTC
Fixed in r84820 by making it send 301 (permenent) redirects instead of 302 (Temporary) redirects.

Based on googling, google will report the target as the page's url when following a 301, but will report the original url when following a 302. Furthermore, interwiki redirects of that form are really permanent, so they should have a 301 redirect.

I only changed what happens when you go to a url of the form http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal . Pages with #Redirect[[B:Some page on wikibooks]] on them will still do 302's since they are arguably non-permenant. (Although if it has chained interwikis, the actual page with #Redirect will be a 302, but the rest in the chain  will be 301)

Marking as fixed. I unfortunately don't have any real way to test this though since I don't control google.
Comment 8 Krinkle 2013-01-14 17:57:26 UTC
Looks like this bug is back. See bug 26115.

*** This bug has been marked as a duplicate of bug 26115 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links