Last modified: 2008-08-04 00:53:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 8068 - Magic word to add noindex to a page's header
Magic word to add noindex to a page's header
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 14209 (view as bug list)
Depends on:
Blocks: 14899 14900
  Show dependency treegraph
 
Reported: 2006-11-28 17:55 UTC by Mårten Berglund
Modified: 2008-08-04 00:53 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Mårten Berglund 2006-11-28 17:55:23 UTC
On user pages (and maybe som other namespaces as well) it should be possible to
use a magic word, something like __NOGOOGLE__, in order to make the google robot
not indexing that page. For instance, I have on my user page a set of subpages,
sand boxes where I play and test or make drafts to what later could be real
wikipedia articles. So I don't want google to index these pages. They now appear
early in google's search result.

On a html-page, the solution to this is to add the line
<pre>
    <meta name="robots" content="noindex,nofollow">
</pre>

Could someone implement something like __NOGOOGLE__ to be used by users who
don't want their user pages indexed?
Comment 1 Rob Church 2006-11-28 18:54:23 UTC
No. Namespaces which robots are asked not to index can be configured, however in
this case, if it's public, then it's indexable. A __NOINDEX__ type magic word
has been discussed before and rejected simply because it's subject to abuse and
misunderstanding.

Google are quite quick at re-crawling bits of Wikipedia content, so if a draft
page has moved to the article space, they'll reflect it within a few days, usually.
Comment 2 Mårten Berglund 2006-11-28 23:03:17 UTC
But let's say that the magic word __NOINDEX__ has no effect but on subpages
belonging to the User namespace, and nowhere else. For instance, only on pages
like: http://xx.wikipedia.org/wiki/User:N_N/a_subpage.

Is that a possible compromise?
Comment 3 Rob Church 2006-11-29 15:22:43 UTC
No, it's up to the people who manage the web site to determine what is and is
not indexed by search engines, and Wikimedia wikis generally have everything
indexed bar pages such as VfD/AfD/whatever the trendy TLA for deletion debates
is, which external viewers don't typically understand.

There is _no reason_ to disable indexing of your user page or any other page in
that namespace. What you are posting to a public web site is public. If you
don't want anyone else to be able to read it or edit it or whatever, _don't post
it_.
Comment 4 Brion Vibber 2008-03-07 20:11:04 UTC
Reopening this as we're considering it or similar as an improvement over lots of manual editing of global robots.txt.
Comment 5 Judson (enwiki:cohesion) 2008-03-16 00:39:09 UTC
We get complaints frequently via otrs about people wanting various logs removed that malign their companies etc. They usually serve a purpose, but it's not like it's content. Just an example. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spam/LinkReports
Comment 6 Robert Rohde 2008-04-28 02:35:11 UTC
Having a __NOINDEX__ magic word is probably the best strategy if we want to differentiate what content ought to appear in search engines in more than a very crude way.  Routinely editing robots.txt is no solution, and I consider it undesirable to simply block out very broad categories of material (such as everything that is not an article).
Comment 7 Bryan Tong Minh 2008-04-29 20:40:59 UTC
I looked into the code, but it appears that $wgOut->setRobotPolicy is called at the very beginning of Article::view. That is a lot lines before the page content is parsed and magic words are evaluated. Anybody an idea how to do this?
Comment 8 Brion Vibber 2008-04-29 21:18:50 UTC
It should be possible to call it again to override it with specific data. You'd have to do this when pulling wiki output out of the ParserOutput object (otherwise the parser cache will always eat everything).
Comment 9 Brion Vibber 2008-05-22 18:18:00 UTC
*** Bug 14209 has been marked as a duplicate of this bug. ***
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-07-23 19:50:18 UTC
Fixed in r37973.  I patterned the code after __NEWSECTIONLINK__, and it seems to work fine.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links