Last modified: 2011-03-30 11:51:41 UTC
There has been some interest in installing the SpamRegex extension (http://www.mediawiki.org/wiki/SpamRegex) on en.wiki, where we have been recently hit by spammers and vandals using edit summaries and log entries to post links and irritating content. This extension, assuming it is usable, would aid us tremendously in combatting this (sometimes torrential) downpour of junk edits. Relevant discussion is at http://en.wikipedia.org/wiki/Wikipedia:An#Two_ways_to_help_prevent_Grawp-related_vandalism and http://en.wikipedia.org/wiki/Wikipedia:VPT#Extension:SpamRegex.
I expect that the 'spamregex' permission should be assigned to 'sysop'.
I would like to suggest that we throw together a big "RegexBlock" extension, which uses a database table (rather than an interface message), and has a bunch of checkboxes for matching URLs, edit summaries, page titles, and so on. This might be better than the endless *Regex extensions that we're installing across the projects (much as we installed a zillion Make* extensions before somebody got up and wrote a general solution).
I would certainly love to see such a unified solution, although I think there is a clear need for local users to be able to edit the regexes used (which is an advantage of the fairly transparent interface message method). However, I think SpamRegex would still be an invaluable stopgap for us until such an extension is written and tested.
I definitely agree. Moreover, this extension would likely bridge the remaining "regex" gap (i.e., edit summaries and page text in cases where spam isn't hyperlinked), whereas the Make* extensions didn't have a theoretical limit. If someone does want to come along later and make an all-encompassing extension, it'd be nice; but, since there's really not much more to worry about once this particular extension is installed, making the all-encompassing extension doesn't have to be uber-high priority.
After another (three) streaks of unbelievably disruptive move-vandalism (see http://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard&oldid=208846386#Grawp and other "Grawp" threads on that page), I've increased the priority of this request. "Highest" is clearly overkill, but we NEED this extension.
Would very much hope that this is enacted with all due speed.
Confirming both the need, the currency of the issues, and that this would actively benefit those involved in anti-vandalism work for the community.
Use of $wgSharedDB needs to be cleared up...
SQL in lookup is kind of ugly in construction.
Regex doesn't use Unicode support?
Looks like it's fragile to bad entries, which will break all checks.
Using inefficient old-style initialization; will force loading of localizations and SpecialPage class due to wfSpamRegexSetup() running on every request. This needs to be pulled up to date.
No token checks on deletion from regex list; an attacker could get an admin to visit URLs which would clear the regexes.
Special page title stuff should be done w/ updated functions to allow for name localization.
More ugly queries in spamRegexList::showList()
Bad HTML output if the list is empty (<ul></ul> with no contents is not legal)
The message construction looks a bit ugly...
More ugly SQL construction in deleteFromList()...
There's some extra URL-decoding going on, why?
Remove the PHP 4-style =& references.
More manually constructed SQL in fetchNumResults... none of the memcache keys are encapsulated, making it a pain to change anything...
makeOption() should be replaced with modern functions...
Bad HTML in form -- improper action URL generation and escaping.
Form is constructed with a lot of manual direction layout; should be redone using the modern classes to ensure correct layout in RTL and LTR alike.
A number of addWikiText(wfMsg())s should be replaced with addWikiMsg() to ensure custom messages are properly supported.
More realllly ugly scary SQL construction on save.
So it's nearly ready then? :D
That's a real blow, especially since our most irritating vandal has evolved: http://en.wikipedia.org/w/index.php?title=Special:Log&limit=1&type=move&page=Emma+Watson he's no longer moving pages to TITLES with characteristic names (we've locked the en.wiki TitleBlacklist down like Fort Knox so it must be getting pretty difficult to get round it), but instead using the trademark "Grawp" title in the edit SUMMARY - currently entirely freely. We were kind of counting on this to deal with him once and for all - I'd be working on that list myself it were in python or C++ :D I'm loathe to ask for priority for anything from the devs, given how badly it must be snowing work up there... but I'll give a cookie to anyone who's prepared to fix it :D
As written, this extension wouldn't affect page moves anyway.
(And another quick note on code layout -- currently all the code bits seem to be loaded at once, which isn't a good practice for performance.)
I've addressed some of concerns pointed out by Brion in r38065. By no way it is a complete fix. For example, one bad spamRegex entry still breaks the entire system (and that's rather annoying).
It should be noted that since r38069, spamRegex is able to block page moves if the summary field contains a spamregexed entry. This relies on the new SpecialMovepageBeforeMove hook, which I added into r38068 to core.
Allowing changes to existing regexes without deleting them would be nice.
Current entries should be an a different page (just as [[Special:IPBlockList]] is separate from [[Special:BlockIP]]).
Should also include a summary/reason field for additions or removals and should keep a log of changes at Special:Log/spamregex, probably matching the protection log format:
13:30, August 14, 2008 Mike.lifeguard (Talk | contribs | block) added "spam\.org\b" (spammer on [[Talk:Main page]] [text=yes, summary=yes, pagemove=no])
13:30, August 14, 2008 Mike.lifeguard (Talk | contribs | block) changed "spam\.org\b" to "\bspam\.org\b" (accidentally blocked nospam.org [text=yes, summary=yes, pagemove=no])
Removing 'shell' keyword, as the code isn't ready.
(In reply to comment #12)
> Removing 'shell' keyword, as the code isn't ready.
Once this is ready, would this be rolled out on all wikis? (preferably with entries from the Spam Blacklist extension being added in automatically) - I imagine that'd make more sense than making wikis request it individually.
And this is currently designed for local blacklisting - will the global blacklist have to continue to use the spam blacklist extension, or can that be integrated here as well so we can use one extension for all spam stuff across WMF, including the meta blacklist?
FWIW, I don't see the point in adding entries from SpamBlacklist extension. SpamRegex is meant to block harmful expressions - things, that can be/will be/were used for vandalism, spamming or general disruption, while SpamBlacklist is meant for plain URLs. It's different if SpamRegex is supposed to make SpamBlacklist obsolete, but I don't see that happening in the near future. :)
And SpamRegex was never meant for local blacklisting. The author of SpamRegex is a Wikia tech, and as you may or may not know, Wikia uses a shared user database. Among the user table, SpamRegex's spam_regex table is shared, too. So blocking an expression (say, "HAGGER???" for example) in one Wikia blocks it in all Wikia's wikis. I don't know how well $wgSharedTables = array( 'spam_regex' ); would work for Wikimedia, though.
We'd probably need Brion to look over the code again since iAlex's fixed a fair bit of the issues mentioned in comment 7. Nevertheless, SpamRegex still is (IIRC) fragile to bad entries and in case of some odd entry causing breakage, SpamRegex blocks the word "array", which is not nice. :) This has been experienced a couple times at Wikia, too, but I don't think the main cause of that array bug was ever traced down.
There would be no benefit to this extension over the Abuse Filter.
I was thinking exactly what Werdna said. I wish people realised that the Extension:AbuseFilter is *THE ALL IN ONE SOLUTION* you have been looking for. I think we can close this because SpamRegex will only double up ways of doing things once AbuseFilter is installed..
(In reply to comment #15)
> There would be no benefit to this extension over the Abuse Filter.
No, there definitely is a benefit. Simplicity of use, and the ability to stop spam globally (for Wikimedia) are concerns. AbuseFilter was never (AFIACT) for spam in particular - but rather pattern vandalism (which you have said yourself). I'd love to make it better for use against spam though - if that's done then maybe this could be considered INVALID/WFM/whatever.
I think this should be duped to bug 4459, and that can maybe be considered resolved by AbuseFilter at some point in the future.
AbuseFilter does everything this extension did and much, much more. This extension might still be useful to wikis for the reasons given in comment #17; but it's no longer needed on enwiki.