Last modified: 2008-08-14 02:31:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 14076 - Exclude bot-generated spam reports on meta from indexing via robots.txt
Exclude bot-generated spam reports on meta from indexing via robots.txt
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on:
Blocks: robots.txt
  Show dependency treegraph
Reported: 2008-05-10 20:09 UTC by Mike.lifeguard
Modified: 2008-08-14 02:31 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Mike.lifeguard 2008-05-10 20:09:38 UTC
Please exclude the following from being indexed by modifying robots.txt:
*[[m:Talk:Spam blacklist]] and subpages
*[[m:User:COIBot/LinkReports]] and subpages
*[[m:User:COIBot/COIReports]] and subpages
*[[m:User:COIBot/UserReports]] and subpages
*[[m:User:SpamReportBot/cw]] and subpages
*And the talk pages for all of the above
Comment 1 A. B. 2008-05-12 13:51:44 UTC
I agree with excluding crawlers from the bot report pages. I disagree with removing crawlers from [[m:Talk:Spam blacklist]] and its subpages. See my reasoning at
Comment 2 Mike.lifeguard 2008-05-21 02:46:48 UTC
OK, per discussion, please do not exclude [[m:Talk:Spam blacklist]] or subpages. The rest listed above is fine to add.
Comment 3 Mike.lifeguard 2008-08-06 15:16:54 UTC
Currently we are planning on having a bot edit some 47000 pages to add __NOINDEX__ - it would be much easier to have the bot reports not indexed with this addition to robots.txt

We are still deciding whether to have the talk page and/or archives indexed or not, but we can manage that with the magic word; no action is required from the sysadmins.
Comment 4 Mike.lifeguard 2008-08-06 15:18:35 UTC
Actually, it is ~25000 pages (enwiki was included in the 47000 figure), but the point remains.
Comment 5 Mike.lifeguard 2008-08-14 02:31:09 UTC
Fixed by r37973 - __NOINDEX__ is applied to the pages in question through a template (& new ones use the magic word directly).

Note You need to log in before you can comment on or make changes to this bug.