Last modified: 2008-08-04 00:53:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T10068, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 8068 - Magic word to add noindex to a page's header


Summary:	Magic word to add noindex to a page's header

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement with 4 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Duplicates:	14209 (view as bug list)
Depends on:
Blocks:	14899 14900
	Show dependency tree / graph

Reported:	2006-11-28 17:55 UTC by Mårten Berglund
Modified:	2008-08-04 00:53 UTC (History)
CC List:	8 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Mårten Berglund 2006-11-28 17:55:23 UTC

On user pages (and maybe som other namespaces as well) it should be possible to
use a magic word, something like __NOGOOGLE__, in order to make the google robot
not indexing that page. For instance, I have on my user page a set of subpages,
sand boxes where I play and test or make drafts to what later could be real
wikipedia articles. So I don't want google to index these pages. They now appear
early in google's search result.

On a html-page, the solution to this is to add the line
<pre>
    <meta name="robots" content="noindex,nofollow">
</pre>

Could someone implement something like __NOGOOGLE__ to be used by users who
don't want their user pages indexed?

Comment 1 Rob Church 2006-11-28 18:54:23 UTC

No. Namespaces which robots are asked not to index can be configured, however in
this case, if it's public, then it's indexable. A __NOINDEX__ type magic word
has been discussed before and rejected simply because it's subject to abuse and
misunderstanding.

Google are quite quick at re-crawling bits of Wikipedia content, so if a draft
page has moved to the article space, they'll reflect it within a few days, usually.

Comment 2 Mårten Berglund 2006-11-28 23:03:17 UTC

But let's say that the magic word __NOINDEX__ has no effect but on subpages
belonging to the User namespace, and nowhere else. For instance, only on pages
like: http://xx.wikipedia.org/wiki/User:N_N/a_subpage.

Is that a possible compromise?

Comment 3 Rob Church 2006-11-29 15:22:43 UTC

No, it's up to the people who manage the web site to determine what is and is
not indexed by search engines, and Wikimedia wikis generally have everything
indexed bar pages such as VfD/AfD/whatever the trendy TLA for deletion debates
is, which external viewers don't typically understand.

There is _no reason_ to disable indexing of your user page or any other page in
that namespace. What you are posting to a public web site is public. If you
don't want anyone else to be able to read it or edit it or whatever, _don't post
it_.

Comment 4 Brion Vibber 2008-03-07 20:11:04 UTC

Reopening this as we're considering it or similar as an improvement over lots of manual editing of global robots.txt.

Comment 5 Judson (enwiki:cohesion) 2008-03-16 00:39:09 UTC

We get complaints frequently via otrs about people wanting various logs removed that malign their companies etc. They usually serve a purpose, but it's not like it's content. Just an example. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spam/LinkReports

Comment 6 Robert Rohde 2008-04-28 02:35:11 UTC

Having a __NOINDEX__ magic word is probably the best strategy if we want to differentiate what content ought to appear in search engines in more than a very crude way.  Routinely editing robots.txt is no solution, and I consider it undesirable to simply block out very broad categories of material (such as everything that is not an article).

Comment 7 Bryan Tong Minh 2008-04-29 20:40:59 UTC

I looked into the code, but it appears that $wgOut->setRobotPolicy is called at the very beginning of Article::view. That is a lot lines before the page content is parsed and magic words are evaluated. Anybody an idea how to do this?

Comment 8 Brion Vibber 2008-04-29 21:18:50 UTC

It should be possible to call it again to override it with specific data. You'd have to do this when pulling wiki output out of the ParserOutput object (otherwise the parser cache will always eat everything).

Comment 9 Brion Vibber 2008-05-22 18:18:00 UTC

*** Bug 14209 has been marked as a duplicate of this bug. ***

Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-07-23 19:50:18 UTC

Fixed in r37973.  I patterned the code after __NEWSECTIONLINK__, and it seems to work fine.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links