Last modified: 2011-03-30 11:51:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15811, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13811 - Install SpamRegex for en.wiki
Install SpamRegex for en.wiki
Status: RESOLVED INVALID
Product: Wikimedia
Classification: Unclassified
Extension setup (Other open bugs)
unspecified
All All
: Normal enhancement with 9 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-21 18:00 UTC by Happy-melon
Modified: 2011-03-30 11:51 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Happy-melon 2008-04-21 18:00:44 UTC
There has been some interest in installing the SpamRegex extension (http://www.mediawiki.org/wiki/SpamRegex) on en.wiki, where we have been recently hit by spammers and vandals using edit summaries and log entries to post links and irritating content.  This extension, assuming it is usable, would aid us tremendously in combatting this (sometimes torrential) downpour of junk edits. Relevant discussion is at http://en.wikipedia.org/wiki/Wikipedia:An#Two_ways_to_help_prevent_Grawp-related_vandalism and http://en.wikipedia.org/wiki/Wikipedia:VPT#Extension:SpamRegex.

I expect that the 'spamregex' permission should be assigned to 'sysop'.
Comment 1 Andrew Garrett 2008-04-22 03:31:53 UTC
I would like to suggest that we throw together a big "RegexBlock" extension, which uses a database table (rather than an interface message), and has a bunch of checkboxes for matching URLs, edit summaries, page titles, and so on. This might be better than the endless *Regex extensions that we're installing across the projects (much as we installed a zillion Make* extensions before somebody got up and wrote a general solution).
Comment 2 Happy-melon 2008-04-22 09:33:16 UTC
I would certainly love to see such a unified solution, although I think there is a clear need for local users to be able to edit the regexes used (which is an advantage of the fairly transparent interface message method).  However, I think SpamRegex would still be an invaluable stopgap for us until such an extension is written and tested.
Comment 3 slakr 2008-04-23 19:18:55 UTC
I definitely agree.  Moreover, this extension would likely bridge the remaining "regex" gap (i.e., edit summaries and page text in cases where spam isn't hyperlinked), whereas the Make* extensions didn't have a theoretical limit.  If someone does want to come along later and make an all-encompassing extension, it'd be nice; but, since there's really not much more to worry about once this particular extension is installed, making the all-encompassing extension doesn't have to be uber-high priority.

Cheers =)
-slakr
Comment 4 Happy-melon 2008-04-28 21:28:22 UTC
After another (three) streaks of unbelievably disruptive move-vandalism (see http://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard&oldid=208846386#Grawp and other "Grawp" threads on that page), I've increased the priority of this request. "Highest" is clearly overkill, but we NEED this extension.
Comment 5 Stifle 2008-05-17 13:48:35 UTC
Would very much hope that this is enacted with all due speed.
Comment 6 FT2 2008-05-17 16:52:47 UTC
Confirming both the need, the currency of the issues, and that this would actively benefit those involved in anti-vandalism work for the community.
Comment 7 Brion Vibber 2008-05-22 23:46:35 UTC
Use of $wgSharedDB needs to be cleared up...

SQL in lookup is kind of ugly in construction.

Regex doesn't use Unicode support?

Looks like it's fragile to bad entries, which will break all checks.

Using inefficient old-style initialization; will force loading of localizations and SpecialPage class due to wfSpamRegexSetup() running on every request. This needs to be pulled up to date.

No token checks on deletion from regex list; an attacker could get an admin to visit URLs which would clear the regexes.

Special page title stuff should be done w/ updated functions to allow for name localization.

More ugly queries in spamRegexList::showList()

Bad HTML output if the list is empty (<ul></ul> with no contents is not legal)

The message construction looks a bit ugly...

More ugly SQL construction in deleteFromList()...

There's some extra URL-decoding going on, why?

Remove the PHP 4-style =& references.

More manually constructed SQL in fetchNumResults... none of the memcache keys are encapsulated, making it a pain to change anything...

makeOption() should be replaced with modern functions...

There's some funky JavaScript in the form which isn't really clear in purpose. It looks like it's trying to ensure that at least one box is checked, but it's duplicative and a bit ugly. :) Woudln't hurt to clean it up a bit and comment it.

Bad HTML in form -- improper action URL generation and escaping.

Form is constructed with a lot of manual direction layout; should be redone using the modern classes to ensure correct layout in RTL and LTR alike.

A number of addWikiText(wfMsg())s should be replaced with addWikiMsg() to ensure custom messages are properly supported.

More realllly ugly scary SQL construction on save.
Comment 8 Happy-melon 2008-05-23 22:58:57 UTC
So it's nearly ready then? :D

That's a real blow, especially since our most irritating vandal has evolved: http://en.wikipedia.org/w/index.php?title=Special:Log&limit=1&type=move&page=Emma+Watson he's no longer moving pages to TITLES with characteristic names (we've locked the en.wiki TitleBlacklist down like Fort Knox so it must be getting pretty difficult to get round it), but instead using the trademark "Grawp" title in the edit SUMMARY - currently entirely freely.  We were kind of counting on this to deal with him once and for all - I'd be working on that list myself it were in python or C++ :D I'm loathe to ask for priority for anything from the devs, given how badly it must be snowing work up there... but I'll give a cookie to anyone who's prepared to fix it :D
Comment 9 Brion Vibber 2008-05-23 23:23:45 UTC
As written, this extension wouldn't affect page moves anyway.

(And another quick note on code layout -- currently all the code bits seem to be loaded at once, which isn't a good practice for performance.)
Comment 10 Jack Phoenix 2008-07-26 21:51:21 UTC
I've addressed some of concerns pointed out by Brion in r38065. By no way it is a complete fix. For example, one bad spamRegex entry still breaks the entire system (and that's rather annoying).

It should be noted that since r38069, spamRegex is able to block page moves if the summary field contains a spamregexed entry. This relies on the new SpecialMovepageBeforeMove hook, which I added into r38068 to core.
Comment 11 Mike.lifeguard 2008-09-12 21:08:26 UTC
Allowing changes to existing regexes without deleting them would be nice.

Current entries should be an a different page (just as [[Special:IPBlockList]] is separate from [[Special:BlockIP]]).

Should also include a summary/reason field for additions or removals and should keep a log of changes at Special:Log/spamregex, probably matching the protection log format:

 13:30, August 14, 2008 Mike.lifeguard (Talk | contribs | block) added "spam\.org\b" ‎(spammer on [[Talk:Main page]] [text=yes, summary=yes, pagemove=no])
 13:30, August 14, 2008 Mike.lifeguard (Talk | contribs | block) changed "spam\.org\b" ‎to "\bspam\.org\b" (accidentally blocked nospam.org [text=yes, summary=yes, pagemove=no])
Comment 12 Mike.lifeguard 2008-09-29 18:08:13 UTC
Removing 'shell' keyword, as the code isn't ready.
Comment 13 Mike.lifeguard 2008-10-11 03:07:09 UTC
(In reply to comment #12)
> Removing 'shell' keyword, as the code isn't ready.
> 

Once this is ready, would this be rolled out on all wikis? (preferably with entries from the Spam Blacklist extension being added in automatically) - I imagine that'd make more sense than making wikis request it individually.

And this is currently designed for local blacklisting - will the global blacklist have to continue to use the spam blacklist extension, or can that be integrated here as well so we can use one extension for all spam stuff across WMF, including the meta blacklist?
Comment 14 Jack Phoenix 2008-10-11 12:11:19 UTC
FWIW, I don't see the point in adding entries from SpamBlacklist extension. SpamRegex is meant to block harmful expressions - things, that can be/will be/were used for vandalism, spamming or general disruption, while SpamBlacklist is meant for plain URLs. It's different if SpamRegex is supposed to make SpamBlacklist obsolete, but I don't see that happening in the near future. :)

And SpamRegex was never meant for local blacklisting. The author of SpamRegex is a Wikia tech, and as you may or may not know, Wikia uses a shared user database. Among the user table, SpamRegex's spam_regex table is shared, too. So blocking an expression (say, "HAGGER???" for example) in one Wikia blocks it in all Wikia's wikis. I don't know how well $wgSharedTables = array( 'spam_regex' ); would work for Wikimedia, though.

We'd probably need Brion to look over the code again since iAlex's fixed a fair bit of the issues mentioned in comment 7. Nevertheless, SpamRegex still is (IIRC) fragile to bad entries and in case of some odd entry causing breakage, SpamRegex blocks the word "array", which is not nice. :) This has been experienced a couple times at Wikia, too, but I don't think the main cause of that array bug was ever traced down.
Comment 15 Andrew Garrett 2008-10-11 12:21:02 UTC
There would be no benefit to this extension over the Abuse Filter.
Comment 16 Brett Hillebrand 2008-12-13 04:59:10 UTC
I was thinking exactly what Werdna said. I wish people realised that the Extension:AbuseFilter is *THE ALL IN ONE SOLUTION* you have been looking for. I think we can close this because SpamRegex will only double up ways of doing things once AbuseFilter is installed..
Comment 17 Mike.lifeguard 2009-03-23 02:16:37 UTC
(In reply to comment #15)
> There would be no benefit to this extension over the Abuse Filter.
> 

No, there definitely is a benefit. Simplicity of use, and the ability to stop spam globally (for Wikimedia) are concerns. AbuseFilter was never (AFIACT) for spam in particular - but rather pattern vandalism (which you have said yourself). I'd love to make it better for use against spam though - if that's done then maybe this could be considered INVALID/WFM/whatever.

I think this should be duped to bug 4459, and that can maybe be considered resolved by AbuseFilter at some point in the future.
Comment 18 Happy-melon 2011-03-30 11:51:41 UTC
AbuseFilter does everything this extension did and much, much more.  This extension might still be useful to wikis for the reasons given in comment #17; but it's no longer needed on enwiki.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links