Last modified: 2008-04-29 21:01:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15864, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13864 - Change default robot indexing behaviour for several namespaces
Change default robot indexing behaviour for several namespaces
Status: RESOLVED INVALID
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-28 01:13 UTC by Larry Pieniazek
Modified: 2008-04-29 21:01 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Larry Pieniazek 2008-04-28 01:13:49 UTC
Change the default for the following namespaces:

User: User_talk: Wikipedia: WP: (1) Wikipedia_talk: WT: (2)

1 -  or more generally, Project: and if there is a shortcut namespace, the default for that too
2 -  or more generally, Project_talk: and if there is a shortcut namespace, the default that too

to not be indexed by bots (effectively, I believe, include <meta name="robots" content="noindex, nofollow"> in the rendered page), across all WMF projects

The rationale is to reduce the amount of project specific material, unlikely to be of general interest to readership, that is exposed publicly. (dirty laundry, if you like) As our own internal search engines improve, the need for external search is lessened. 

I believe that the following bugs may be related but I was unable to find this specific one, If it's a dup, my apologies. 

8068 Magic word to add noindex to a page's header
9415 option to protect pages from being indexed by search engines
10052 Add class="robots-nocontent" in footer to avoid search engine to index it
11720 Google (and others) is indexing data dumps

(8068 and 9415 allow variability by page, this is saying to change the default for certain name spaces but if it were implemented along with 8068 or 9415, one could override the default on a page by page basis)
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-04-28 01:21:44 UTC
I guess this is for the English Wikipedia?  In that case a change to robots.txt would be simplest.  If by "default" you mean default for all wikis unless opting out, that would require a different approach, probably, unless we want to include every single localization of all of the above (1000 lines? more? kept up-to-date how?) in robots.txt.

The major objection to this is that Wikipedians use Google to find project-related pages on Wikipedia.  That they also show up in searches by the general public is arguably not great, but in practice it seems like the former usage outweighs the importance of the latter.  Although as you say, Wikipedia's built-in search is steadily improving.
Comment 2 Larry Pieniazek 2008-04-28 01:35:29 UTC
If you want to discuss this further somewhere else please suggest somewhere and we can take it there... but I'm actually thinking of asking for this to be across all WMF wikis, not just en:wp. I think en:wp is where the problem (of dirty laundry visibility) is manifesting itself first, but in the long run, yes, all wikis unless opted out. I don't know enoough about implementation details to suggest how to implement it cleanly. Perhaps something that generates the appropriate file by examining project configurations? (presumably the project -> Wikipedia: and User: -> Usario: etc mappings that result from namespace localisations are kept in a mapping table somewhere in the wiki? else how does the SOFTWARE know that  Usario (or Benutzer or whatever) is User: ?)

I agree that Wikipedians do use google to find project related pages on Wikipedia. And maybe a project debate is required to secure community approval for the change, but this bug addresses the mechanics/underpinnings... I'll save making the BLP and "do no harm" arguments for that discussion I guess. I'm just hopeful that as the internal search continues to improve that becomes less of a factor. And it does seem to be getting better by leaps and bounds of late... I actually started using it again, it's that good.
Comment 3 JeLuF 2008-04-28 19:34:32 UTC
Discussion about changes concerning all projects should take place on meta. Please come back when there's consensus for this decision in the community at large.
Comment 4 Larry Pieniazek 2008-04-28 20:05:09 UTC
Please separate evaluation of the technical aspects of doing this (See Simetrical's comments, there are apparently several different possible approaches, as well as interrelationships with other bugs that I mentioned in the open) which merit discussion, from evaluation of community consensus. if the facility to do this is difficult, it doesn't matter what consensus is or isn't. If the facility to do this is easy, it could well be enabled, regardless of which wikis do or don't decide to use it.
Comment 5 JeLuF 2008-04-28 21:38:50 UTC
Removed the "shell" keyword and the "site request" component since this is not a request for a specific change but a general discussion topic.
Comment 6 Andrew 2008-04-29 19:35:20 UTC
Can this be implemented so that each wiki can choose a subset of namespaces they don't want indexed (maybe Special:Noindex could host a list like:
Talk
User talk
Wikipedia talk
And so forth of various namespaces where pages are noindex'd, then projects can be allowed to decide for themselves which pages to noindex or not, and later if meta comes to some overall policy, it could be discussed then.  By localising discussion, it makes discussion possible.
Comment 7 Brion Vibber 2008-04-29 21:01:47 UTC
Yes, namespaces can indeed be disabled from indexing per-wiki by customizing $wgNamespaceRobotPolicies. This has existed for some time, and no technical changes are required for implementation of a community decision.

I'm INVALIDing this pending general community decision.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links