Last modified: 2014-11-17 10:34:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6459, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 4459 - Create a special page to handle additions, removals, changes and logging of spam blacklist entries


Summary:	Create a special page to handle additions, removals, changes and logging of s...

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	Spam Blacklist (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low enhancement with 10 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Duplicates:	4698 13805 14090 (view as bug list)
Depends on:
Blocks:	SWMT
	Show dependency tree / graph

Reported:	2006-01-03 04:06 UTC by Jakob Voss
Modified:	2014-11-17 10:34 UTC (History)
CC List:	14 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Jakob Voss 2006-01-03 04:06:55 UTC

There should be a special page to manage the spam blacklist. Admins should be
able to check URLs agains the blacklist. I had an URL that was blacklisted and I
could not find out which regular expression matched it (ok, I could write a
simple perl script to to this on my computer). There are some other related
suggestions about spam, see

http://bugzilla.wikimedia.org/show_bug.cgi?id=1505
http://bugzilla.wikimedia.org/show_bug.cgi?id=1733
http://bugzilla.wikimedia.org/show_bug.cgi?id=2598
http://bugzilla.wikimedia.org/show_bug.cgi?id=584

Maybe you should rewite the entire Spam protection mechanism

Comment 1 Jakob Voss 2006-01-03 04:13:46 UTC

I finally found that www.2100books.com matches against [0-9]+books\.com. So how
do I know which [0-9]+books\.com pages are good and which are evil? Who entered
the regexp because of which pages? Maybe you can find out in the version history
but managing the spam blacklist should better be as easy as blocking users and
pages.

Comment 2 Owltom 2006-01-05 23:43:28 UTC

A rewritten "Spam protection mechanism" should definetly be part 
of each Mediawiki-installation. Many sites use this software, but 
they don't install optional extensions. Sysops have to fight 
wikispam without proper "weapons", as they usually have no access 
to the servers.

This feature should be enabled by default (with an empty spam-
blacklist, which is editable by sysops.

Comment 3 Rob Church 2006-02-23 14:41:17 UTC

Setting product correctly. I have seen a demand for a slightly easier-to-use
version of Spam Blacklist, so it might not be a bad idea to consider this a
separate request. Leaving ambiguous for now.

Comment 4 Brian Jason Drake 2006-04-02 03:19:53 UTC

(In reply to comment #0)
> http://bugzilla.wikimedia.org/show_bug.cgi?id=1505

Or just bug 1505 - it automatically creates a link.

Comment 5 Rob Church 2007-08-19 21:30:08 UTC

*** Bug 4698 has been marked as a duplicate of this bug. ***

Comment 6 Mike.lifeguard 2008-05-14 03:56:37 UTC

*** Bug 13805 has been marked as a duplicate of this bug. ***

Comment 7 Mike.lifeguard 2008-05-21 00:43:36 UTC

*** Bug 14090 has been marked as a duplicate of this bug. ***

Comment 8 Mike.lifeguard 2008-09-12 21:17:04 UTC

If SpamRegex is fixed up, it might fulfil this need; see bug 13811.

Comment 9 Mike.lifeguard 2008-10-24 23:23:29 UTC

(In reply to comment #8)
> If SpamRegex is fixed up, it might fulfil this need; see bug 13811.
> 

Per bug 13811 comment 14, that's apparently not true. This will probably be fulfilled by AbuseFilter, which Werdna is working on, so I've CCed him.

Comment 10 seth 2009-02-19 19:12:02 UTC

We don't have a special page yet, but there are tools like http://toolserver.org/~seth/grep_regexp_from_url.cgi which give the possibility to search for a entry and for its reason. This toll can be used in MediaWiki:Spamprotectionmatch, e.g., http://de.wikipedia.org/wiki/MediaWiki:Spamprotectionmatch/en.

So afaics the main thing - which was the difficulty in finding already blacklisted links - is solved.

Comment 11 Mike.lifeguard 2009-02-19 19:13:34 UTC

(In reply to comment #10)
> We don't have a special page yet, but there are tools like
> http://toolserver.org/~seth/grep_regexp_from_url.cgi which give the possibility
> to search for a entry and for its reason. This toll can be used in
> MediaWiki:Spamprotectionmatch, e.g.,
> http://de.wikipedia.org/wiki/MediaWiki:Spamprotectionmatch/en.
> 
> So afaics the main thing - which was the difficulty in finding already
> blacklisted links - is solved.
> 

External tools are *not* sufficient.

Comment 12 Mike.lifeguard 2009-03-23 02:53:28 UTC

There are probably-useful notes on http://www.mediawiki.org/wiki/Extension_talk:SpamBlacklist#more_detailed_manual_and_suggestions and certainly on http://www.mediawiki.org/wiki/Regex-based_blacklist

Both AbuseFilter and SpamRegex would need lots of work to be a viable alternative to SpamBlacklist at present. Some of the major concerns with replacing SpamBlacklist with AbuseFilter follow (concerns regarding replacing SpamBlacklist with SpamRegex are discussed on bug 13811):

*Global filters (bug 17811) are really required since probably 1/3 our spam blocking as a Wikimedia community happens globally.
**Relatedly, local wikis would need some way to opt-out of blocking individual domains (or individual filters - and you might block multiple domains with a single filter - we do use regex after all :D)

*Also relatedly, we need to output for non-WMF wikis - but only the spam-related filters! So, probably some method of categorizing them will be necessary. That'd also be useful since if you have several thousand filters, it will quickly become *very* difficult to search through them all for a particular one - tagging/categorizing of filters and searching within the notes will be needed.
**As well, this assumes that all third parties will install AbuseFilter - which will not happen. So, ideally there would be a compatibility function to provide output at least somewhat equivalent to the output of SpamBlacklist which could be used as input for third party installations.

*Regarding workflow: AbuseFilter is not designed for blocking spam (it is meant to target pattern vandalism), and the workflow reflects that. We need to be able to quickly and painlessly add formulaic filters which do a very small subset of what AbuseFilter is capable of. I had suggested in the past that there could be filter templates for common purposes (such as blocking spam) - users would just fill in the blank and apply the filter.

*Performance: Someone should compare the performance effects of blocking all the domains we're currently blocking with SpamBlacklist using AbuseFilter instead (using one filter for each line of regex vs one filter for the whole thing would also be a useful comparison - is there an impact there? That could affect workflow significantly depending on the answer.)

*AbuseFilter can resolve bug 16325 in a user-friendly way: If all_links has whatever.com then present a particular message asking them to remove it (but potentially let them still save the edit or not, depending)

*For authors, showing the edit form after a hit (bug 16757) is important & AbuseFilter would resolve that.

*The AbuseFilter log would resolve bug 1542 nicely (& we are even replicating that to the toolserver).

*Rollback can be exempted easily, which would resolve bug 15450 perfectly.

*AbuseFilter can use new_html to resolve bug 15582 somewhat at least -- someone should figure out how true that statement is, since I'm no expert there. Potentially bug 16610 too?

*If AbuseFilter were modified, it could potentially resolve bug 16466 in an acceptable manner. Bug 14114 too?

*AbuseFilter could potentially resolve bug 16338 and bug 13599, depending on how one sets up the filters.

*AbuseFilter could maybe be modified to allow per-page exceptions (bug 12963)... something like an whitelist filter? Or you could mash that into the original filter, which goes back to the workflow problem.

*AbuseFilter's ccnorm() and/or rmspecials() would resolve the unicode problem (bug 12896) AFAICT -- though that should certainly be tested & verified.

*AbuseFilter's warn function would resolve bug 9416 in a very user-friendly manner.

----

In summation: AbuseFilter needs to implement global filters, local exemption, backward compatibility with SpamBlacklist on third-party installs, better filter tagging/searching and other workflow improvements before it can be considered a viable alternative to SpamBlacklist.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links