Last modified: 2008-08-22 20:45:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T11415, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 9415 - option to protect pages from being indexed by search engines


Summary:	option to protect pages from being indexed by search engines

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement with 4 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	patch, need-review
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-03-25 21:35 UTC by Ruud Koot
Modified:	2008-08-22 20:45 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Enable editing of robot metadata per page, via Special:Protect (6.86 KB, patch) 2007-06-01 07:45 UTC, Daniel Cannon (AmiDaniel)	Details
Add an attachment (proposed patch, testcase, etc.)

Description Ruud Koot 2007-03-25 21:35:06 UTC

Add an option to the protect tab which allows administrators to "protect" a page
from being index by search engines (by adding a <meta name="robots"
content="noindex, nofollow">.) This would be useful on pages containing
sensitive information but are not in a namespace which are not indexed by default.

Comment 1 Rob Church 2007-03-25 21:37:36 UTC

Too prone to abuse.

Comment 2 Ruud Koot 2007-03-25 21:44:53 UTC

Do you mean too prone to abuse as in administrators who will abuse this feature
(how?) or search engines etc. not respecting the <meta name="robots"
content="noindex, nofollow"> tag and thereby creating a false sense of security?

Comment 3 Rob Church 2007-03-25 21:45:25 UTC

The former.

Comment 4 Aaron Schulz 2007-03-25 21:48:06 UTC

Admins who don't "like" revisions can easily just remove them from major search
engine results.

Why would we need this? Either you delete the page or you leave it. As for
discussion pages/AfD, if you don't want outsiders, perhaps an extension to
remove whole namespaces from indexes might be an idea.

Comment 5 Ruud Koot 2007-03-25 21:56:00 UTC

I cannot readily imagine a way to abuse this feature (removing an article from
search engine results being not as effective as, say, completely deleting the
article and there usually being enough peer-review among administrators to
successfully any hypothetical abuse) but given the terseness of your answer I
suspect this has been discussed before. Could you perhaps give me a pointer to a
relevant feature request/mailing list discussion/...?

Comment 6 Aaron Schulz 2007-03-25 22:00:30 UTC

Remove from search engines as a good way to suppress pages you don't like, POV
or whatever from much of the public. I suppose if it where logged, there could
be some oversight. But why? What is the use of making it harder to get to, but
still accessible. That's broken inconsistent CMS. 

If you want to stop outsiders from flooding a project discussion, why do it
selectively, why not take out the whole project/whatever namespace from the index?

I just can't see a use for this.

Comment 7 Ruud Koot 2007-03-25 22:16:43 UTC

The concrete example leading to this feature request can be found here
<http://lists.wikimedia.org/pipermail/wikien-l/2007-March/066466.html>. Assume
the action is properly logged etc. I still do not see how this is more prone to
abuse than giving administrators the ability to protect or delete pages.

Comment 8 Rob Church 2007-03-25 23:00:58 UTC

Individual communities abuse features all the time. Decisions leading to hiding
content from the general public need to be made with Board or equivalent level
approval, not a five second straw poll on the English Wikipedia.

Such a feature would not have an immediate effect owing to the nature of search
engine spidering schedules, and the fact that not all spiders will bother to
honour the tag.

Comment 9 Daniel Cannon (AmiDaniel) 2007-06-01 07:45:07 UTC

Created attachment 3702 [details]
Enable editing of robot metadata per page, via Special:Protect

Unfortunately, I did not notice this bug until after I finished writing this, when Simetrical pointed it out. I talked to Tim Starling earlier today, who just finished adding in $wgArticleRobotPolicies, and he stated that adding a user interface component to allow setting the robot policies per article would be a fine idea, so long as the implementation was "relatively elegant". This is about as elegant as I could get it, though it may need some cleaning up around the edges.

The primary incentive was that requests for modifying robots.txt (or now, thanks to Tim, modifying $wgArticleRobotPolicies) have been steadily on the rise recently, and there are very, very few people on Wikimedia capable of fulfilling these requests (and who certainly have better things to do with their time). Unfortunately, Google's cache has recently become of increasing use (or misuse) to individuals attempting to dig out private information that momentarily appears on pages before it is oversighted or deleted. As such, requests to hide pages prone to being oversighted or pages that should not generally be entirely public and cached for other reasons have become a necessity to fulfill.

The concerns about abuse I view as valid, but they are of far lesser concern than the potential for abuse offered by allowing sysops to delete pages, let alone for oversights to remove revisions with no public record whatsoever of this removal. In an attempt to curb this abuse, my implementation will allow the modification of robots policies on pages only to users with the "editrobots" permission, alloted by default to sysops, but that can be reallotted to bureaucrats or, if need be, to oversights. As this is an operation that will need to be performed very rarely, there should be no problem allowing it only to a much smaller user group, such as oversights; it is, however, important that it be allowed to a larger group than it currently is.

Anyway, I hope that you will reconsider or at least review the patch; if it's not accepted, I won't be heartbroken, but I do think it has the potential to be quite useful. I also have it running on a live install at https://amidaniel.com/testwiki if you want to try it out.

Comment 10 Rob Church 2007-06-06 03:32:36 UTC

For the page to disappear from a search engine, the engine's crawler will have to revisit the page, which might not happen for some time. This reduces the utility of such an option, because a short-term removal might not have an effect at all.

Comment 11 Daniel Cannon (AmiDaniel) 2007-06-21 16:28:29 UTC

Committed by Raymond as r23166.

Comment 12 Raimond Spekking 2007-06-26 08:49:44 UTC

(In reply to comment #11)
> Committed by Raymond as r23166.
> 

Reverted by Brion with r23226:

There are issues with putting robots stuff into the current protection system, so we're backing this out to prevent another backwards-compatibility disaster when it's done in a more reliable way.  :)

Comment 13 Dan Jacobson 2007-12-26 01:44:01 UTC

See also Bug 8473 about $wgArticleRobotPolicies being too weak to act on Special:*.

Comment 14 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-07-23 20:12:14 UTC

If the patch for bug 8068 remains checked in, this is probably no longer necessary.

Comment 15 Raimond Spekking 2008-08-22 20:45:06 UTC

(In reply to comment #14)
> If the patch for bug 8068 remains checked in, this is probably no longer
> necessary.
> 

Patch for above bug is live since 4 weeks --> closing this bug as FIXED

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links