Last modified: 2005-05-08 20:29:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T4112, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 2112 - Proposal to pinpoint almost all vandalism


Summary:	Proposal to pinpoint almost all vandalism

Status:	RESOLVED DUPLICATE of bug 958

Product:	MediaWiki
Classification:	Unclassified
Component:	History/Diffs (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2005-05-08 16:48 UTC by Kai
Modified:	2005-05-08 20:29 UTC (History)
CC List:	0 users

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Kai 2005-05-08 16:48:56 UTC

This proposal has already been introduced to the Village Pump where it has been 
unanimously supported. http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)
#Edit_Alerts_Based_on_Content

I ask if the recent changes system could be overhauled to incorporate pattern 
reocognition based on the very predictable attributes of vandalism.  A pattern match 
would result in the recent change appearing highlighted in red under the recent changes 
section.

Some patterns the software should watch for:

*Edits within content namespace containing high frequency vandal words: Gay, fuck, shit, 
penis, cock, fag, SUCKS, etc. The entire corpus of the vandal word bank will be short and 
enumerated without much research...
*Exclamation marks are repeated in sequence more than three times
*Blanking
*Entire sub-headed area blanked
*Large article loses very substantial percentage of content

Anyone familiar with controlling vandals will know this short list can be reduced and 
still strike at almost all malicious vandalism.  

I know you programmers are overtasked and overrequested but I would like to impress how 
this is a top priority update.  Regular contributers are plagued by this puerile trash, 
and it's very demoralizing to find ourselves volunteering to clean up after foul kids.  
This is a problem that debilitates Wikipedia's editor corps.  The pressure and sometimes 
hopeless feeling of being overwhelmed is made real by the current lack of almost any 
vandal countermeasure.  The current system requires omniscience to be successful and that 
we understand, considering the number of edits, is an eyeball outnumbered task.  Software 
aided cleanup will lift this pressure.  

Worse of all, I don't think within our community a honest assessment of how successful 
vandalism is is discussed.  The generally positive media reports about us sometimes cite 
the ancient IBM study or state themselves that vandalism is often rolled back within 
minutes.  This is infrequently true still but anyone familiar with reverting vandalism 
has observed response times of 90 minutes, 3 hours, 6 hours, and beyond are more common.  
Our efficiency in squashing adolescent behavior is not as potent as we are billed to be 
or perhaps believe about ourselves.  The large gaps in response time to vandalim are a 
serious problem for information and brand integrity.  

Please consider the massive benefit in performance and uplifting effect of implementing 
an idea like this.

Regards,
Lotsofissues

Comment 1 T. Gries 2005-05-08 17:07:46 UTC

(In reply to comment #0)
> .. system could be overhauled to incorporate pattern recognition based on
attributes of vandalism. 

... e.g. Bayes-filer rules [5]

> Some patterns the software should watch for:
> *Edits within content namespace containing high frequency vandal words: xxx.
The entire corpus of the vandal word bank will be short and 
> enumerated without much research...
> *Exclamation marks are repeated in sequence more than three times
> *Blanking
> *Entire sub-headed area blanked
> *Large article loses very substantial percentage of content
> 

Your proposal can be _combined_ with [1]:

"E-mail notification for page changes or new pages, 
 where title or body or category matches a regular expression"

on which I am working. 
The current Enotif [2, 3] already allows for notifications on all new pages (for
Sysops etc.) and can be extended with your "suspicious page vandal action watch"
list. I'll put this onto the to-do list [4]

[1] http://bugzilla.wikipedia.org/show_bug.cgi?id=1116
[2] http://meta.wikipedia.org/wiki/Enotif 
[3] http://bugzilla.wikipedia.org/show_bug.cgi?id=454
[4] http://meta.wikimedia.org/wiki/Email_notification_to-do_list
[5] http://en.wikipedia.org/wiki/Bayesian_filtering

Comment 2 Catherine Munro 2005-05-08 18:32:47 UTC

Is this a duplicate of bug 958?

Comment 3 Ævar Arnfjörð Bjarmason 2005-05-08 18:35:27 UTC

It is indeed, marking it as a duplicate.

*** This bug has been marked as a duplicate of 958 ***

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links