Last modified: 2013-01-14 17:57:26 UTC
I've noticed some pages have being indexed by Google as if they were from English Wikipedia. The url above shows the result of a query like "B:Pt:Página_principal" which has the following url as the only result: http://en.wikipedia.org/wiki/B:Pt:Página_principal For another example, see the result for "Teoria de números/Números primos": http://www.google.com.br/search?q=%22Teoria+de+n%C3%BAmeros%2FN%C3%BAmeros+primos%22 This shouldn't happen.
(In reply to comment #0) > This shouldn't happen. And we can't fix Google. This may be due to the sort of redirect that is in place there. But when I tried to investigate, I got a 403 Forbidden error from http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal, but I think that was because of the User Agent.
See [[meta:nofollow]]
We send 302 Moved Temporarily status codes (Should they be moved permenantly?): bawolff@Bawolff-L:/var/www/w/$ HEAD -S -H 'User-agent: test' \ http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal HEAD http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal --> 302 Moved Temporarily HEAD http://en.wikibooks.org/wiki/Pt:P%C3%A1gina_principal --> 302 Moved Temporarily HEAD http://pt.wikibooks.org/wiki/P%C3%A1gina_principal --> 200 OK Cache-Control: private, s-maxage=0, max-age=0, must-revalidate Connection: close Date: Sat, 26 Mar 2011 18:07:16 GMT Age: 7188 Server: Apache Vary: Accept-Encoding,Cookie Content-Language: pt Content-Length: 73670 Content-Type: text/html; charset=UTF-8 Last-Modified: Thu, 03 Mar 2011 19:23:27 GMT Client-Date: Sat, 26 Mar 2011 20:07:05 GMT Client-Peer: 208.80.152.2:80 Client-Response-Num: 1 X-Cache: HIT from sq40.wikimedia.org X-Cache: MISS from sq36.wikimedia.org X-Cache-Lookup: HIT from sq40.wikimedia.org:3128 X-Cache-Lookup: MISS from sq36.wikimedia.org:80
Please always use sentence case when changing bug summaries (initial capital letter, lowercase everything else, except words that are always capitalized like proper nouns and variables, no trailing punctuation). In any case, this seems like a duplicate of bug 8753 if we're to believe the new bug summary ("interwiki links should have the nofollow attribute"). However, it's unclear whether this new bug summary is accurate. The old bug summary and the opening comment are about a particular symptom ("Pages should not be indexed through interwiki links from other wikis") while the updated bug summary is about a specific solution. This discrepancy needs to be addressed.
(In reply to comment #4) > However, it's unclear whether this new bug summary is accurate. The old bug > summary and the opening comment are about a particular symptom ("Pages should > not be indexed through interwiki links from other wikis") while the updated bug > summary is about a specific solution. This discrepancy needs to be addressed. Specific solutions to particular symptoms are appropriate since we can only implement specific solutions. Putting the proposed solution in the summary is appropriate, since it focuses the bug. If the solution is implemented, the bug can be closed if it addresses the particular symptoms. If, at some later time, people are dis-satisfied with the solution, a new bug should be opened with their specific issues.
(In reply to comment #5) > If, at some later time, people are dis-satisfied with the solution, a new bug > should be opened with their specific issues. That doesn't seem particularly fair to the person who took the time to file a bug about their specific problem. You're changing the nature of their request and then telling them that if they don't like how the new request is implemented, they can file another bug? That seems completely backward and wrong. If you think the issue of interwiki links not having the "nofollow" attribute needs attention, reopen bug 8753. I'm mostly reverting the bug summary here for now.
Fixed in r84820 by making it send 301 (permenent) redirects instead of 302 (Temporary) redirects. Based on googling, google will report the target as the page's url when following a 301, but will report the original url when following a 302. Furthermore, interwiki redirects of that form are really permanent, so they should have a 301 redirect. I only changed what happens when you go to a url of the form http://en.wikipedia.org/wiki/B:Pt:P%C3%A1gina_principal . Pages with #Redirect[[B:Some page on wikibooks]] on them will still do 302's since they are arguably non-permenant. (Although if it has chained interwikis, the actual page with #Redirect will be a 302, but the rest in the chain will be 301) Marking as fixed. I unfortunately don't have any real way to test this though since I don't control google.
Looks like this bug is back. See bug 26115. *** This bug has been marked as a duplicate of bug 26115 ***