Last modified: 2008-04-29 21:01:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13864 - Change default robot indexing behaviour for several namespaces
Change default robot indexing behaviour for several namespaces
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
All All
: Normal enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2008-04-28 01:13 UTC by Larry Pieniazek
Modified: 2008-04-29 21:01 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Larry Pieniazek 2008-04-28 01:13:49 UTC
Change the default for the following namespaces:

User: User_talk: Wikipedia: WP: (1) Wikipedia_talk: WT: (2)

1 -  or more generally, Project: and if there is a shortcut namespace, the default for that too
2 -  or more generally, Project_talk: and if there is a shortcut namespace, the default that too

to not be indexed by bots (effectively, I believe, include <meta name="robots" content="noindex, nofollow"> in the rendered page), across all WMF projects

The rationale is to reduce the amount of project specific material, unlikely to be of general interest to readership, that is exposed publicly. (dirty laundry, if you like) As our own internal search engines improve, the need for external search is lessened. 

I believe that the following bugs may be related but I was unable to find this specific one, If it's a dup, my apologies. 

8068 Magic word to add noindex to a page's header
9415 option to protect pages from being indexed by search engines
10052 Add class="robots-nocontent" in footer to avoid search engine to index it
11720 Google (and others) is indexing data dumps

(8068 and 9415 allow variability by page, this is saying to change the default for certain name spaces but if it were implemented along with 8068 or 9415, one could override the default on a page by page basis)
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-04-28 01:21:44 UTC
I guess this is for the English Wikipedia?  In that case a change to robots.txt would be simplest.  If by "default" you mean default for all wikis unless opting out, that would require a different approach, probably, unless we want to include every single localization of all of the above (1000 lines? more? kept up-to-date how?) in robots.txt.

The major objection to this is that Wikipedians use Google to find project-related pages on Wikipedia.  That they also show up in searches by the general public is arguably not great, but in practice it seems like the former usage outweighs the importance of the latter.  Although as you say, Wikipedia's built-in search is steadily improving.
Comment 2 Larry Pieniazek 2008-04-28 01:35:29 UTC
If you want to discuss this further somewhere else please suggest somewhere and we can take it there... but I'm actually thinking of asking for this to be across all WMF wikis, not just en:wp. I think en:wp is where the problem (of dirty laundry visibility) is manifesting itself first, but in the long run, yes, all wikis unless opted out. I don't know enoough about implementation details to suggest how to implement it cleanly. Perhaps something that generates the appropriate file by examining project configurations? (presumably the project -> Wikipedia: and User: -> Usario: etc mappings that result from namespace localisations are kept in a mapping table somewhere in the wiki? else how does the SOFTWARE know that  Usario (or Benutzer or whatever) is User: ?)

I agree that Wikipedians do use google to find project related pages on Wikipedia. And maybe a project debate is required to secure community approval for the change, but this bug addresses the mechanics/underpinnings... I'll save making the BLP and "do no harm" arguments for that discussion I guess. I'm just hopeful that as the internal search continues to improve that becomes less of a factor. And it does seem to be getting better by leaps and bounds of late... I actually started using it again, it's that good.
Comment 3 JeLuF 2008-04-28 19:34:32 UTC
Discussion about changes concerning all projects should take place on meta. Please come back when there's consensus for this decision in the community at large.
Comment 4 Larry Pieniazek 2008-04-28 20:05:09 UTC
Please separate evaluation of the technical aspects of doing this (See Simetrical's comments, there are apparently several different possible approaches, as well as interrelationships with other bugs that I mentioned in the open) which merit discussion, from evaluation of community consensus. if the facility to do this is difficult, it doesn't matter what consensus is or isn't. If the facility to do this is easy, it could well be enabled, regardless of which wikis do or don't decide to use it.
Comment 5 JeLuF 2008-04-28 21:38:50 UTC
Removed the "shell" keyword and the "site request" component since this is not a request for a specific change but a general discussion topic.
Comment 6 Andrew 2008-04-29 19:35:20 UTC
Can this be implemented so that each wiki can choose a subset of namespaces they don't want indexed (maybe Special:Noindex could host a list like:
User talk
Wikipedia talk
And so forth of various namespaces where pages are noindex'd, then projects can be allowed to decide for themselves which pages to noindex or not, and later if meta comes to some overall policy, it could be discussed then.  By localising discussion, it makes discussion possible.
Comment 7 Brion Vibber 2008-04-29 21:01:47 UTC
Yes, namespaces can indeed be disabled from indexing per-wiki by customizing $wgNamespaceRobotPolicies. This has existed for some time, and no technical changes are required for implementation of a community decision.

I'm INVALIDing this pending general community decision.

Note You need to log in before you can comment on or make changes to this bug.