Last modified: 2014-11-17 10:34:51 UTC
There should be a special page to manage the spam blacklist. Admins should be
able to check URLs agains the blacklist. I had an URL that was blacklisted and I
could not find out which regular expression matched it (ok, I could write a
simple perl script to to this on my computer). There are some other related
suggestions about spam, see
Maybe you should rewite the entire Spam protection mechanism
I finally found that www.2100books.com matches against [0-9]+books\.com. So how
do I know which [0-9]+books\.com pages are good and which are evil? Who entered
the regexp because of which pages? Maybe you can find out in the version history
but managing the spam blacklist should better be as easy as blocking users and
A rewritten "Spam protection mechanism" should definetly be part
of each Mediawiki-installation. Many sites use this software, but
they don't install optional extensions. Sysops have to fight
wikispam without proper "weapons", as they usually have no access
to the servers.
This feature should be enabled by default (with an empty spam-
blacklist, which is editable by sysops.
Setting product correctly. I have seen a demand for a slightly easier-to-use
version of Spam Blacklist, so it might not be a bad idea to consider this a
separate request. Leaving ambiguous for now.
(In reply to comment #0)
Or just bug 1505 - it automatically creates a link.
*** Bug 4698 has been marked as a duplicate of this bug. ***
*** Bug 13805 has been marked as a duplicate of this bug. ***
*** Bug 14090 has been marked as a duplicate of this bug. ***
If SpamRegex is fixed up, it might fulfil this need; see bug 13811.
(In reply to comment #8)
> If SpamRegex is fixed up, it might fulfil this need; see bug 13811.
Per bug 13811 comment 14, that's apparently not true. This will probably be fulfilled by AbuseFilter, which Werdna is working on, so I've CCed him.
We don't have a special page yet, but there are tools like http://toolserver.org/~seth/grep_regexp_from_url.cgi which give the possibility to search for a entry and for its reason. This toll can be used in MediaWiki:Spamprotectionmatch, e.g., http://de.wikipedia.org/wiki/MediaWiki:Spamprotectionmatch/en.
So afaics the main thing - which was the difficulty in finding already blacklisted links - is solved.
(In reply to comment #10)
> We don't have a special page yet, but there are tools like
> http://toolserver.org/~seth/grep_regexp_from_url.cgi which give the possibility
> to search for a entry and for its reason. This toll can be used in
> MediaWiki:Spamprotectionmatch, e.g.,
> So afaics the main thing - which was the difficulty in finding already
> blacklisted links - is solved.
External tools are *not* sufficient.
There are probably-useful notes on http://www.mediawiki.org/wiki/Extension_talk:SpamBlacklist#more_detailed_manual_and_suggestions and certainly on http://www.mediawiki.org/wiki/Regex-based_blacklist
Both AbuseFilter and SpamRegex would need lots of work to be a viable alternative to SpamBlacklist at present. Some of the major concerns with replacing SpamBlacklist with AbuseFilter follow (concerns regarding replacing SpamBlacklist with SpamRegex are discussed on bug 13811):
*Global filters (bug 17811) are really required since probably 1/3 our spam blocking as a Wikimedia community happens globally.
**Relatedly, local wikis would need some way to opt-out of blocking individual domains (or individual filters - and you might block multiple domains with a single filter - we do use regex after all :D)
*Also relatedly, we need to output for non-WMF wikis - but only the spam-related filters! So, probably some method of categorizing them will be necessary. That'd also be useful since if you have several thousand filters, it will quickly become *very* difficult to search through them all for a particular one - tagging/categorizing of filters and searching within the notes will be needed.
**As well, this assumes that all third parties will install AbuseFilter - which will not happen. So, ideally there would be a compatibility function to provide output at least somewhat equivalent to the output of SpamBlacklist which could be used as input for third party installations.
*Regarding workflow: AbuseFilter is not designed for blocking spam (it is meant to target pattern vandalism), and the workflow reflects that. We need to be able to quickly and painlessly add formulaic filters which do a very small subset of what AbuseFilter is capable of. I had suggested in the past that there could be filter templates for common purposes (such as blocking spam) - users would just fill in the blank and apply the filter.
*Performance: Someone should compare the performance effects of blocking all the domains we're currently blocking with SpamBlacklist using AbuseFilter instead (using one filter for each line of regex vs one filter for the whole thing would also be a useful comparison - is there an impact there? That could affect workflow significantly depending on the answer.)
*AbuseFilter can resolve bug 16325 in a user-friendly way: If all_links has whatever.com then present a particular message asking them to remove it (but potentially let them still save the edit or not, depending)
*For authors, showing the edit form after a hit (bug 16757) is important & AbuseFilter would resolve that.
*The AbuseFilter log would resolve bug 1542 nicely (& we are even replicating that to the toolserver).
*Rollback can be exempted easily, which would resolve bug 15450 perfectly.
*AbuseFilter can use new_html to resolve bug 15582 somewhat at least -- someone should figure out how true that statement is, since I'm no expert there. Potentially bug 16610 too?
*If AbuseFilter were modified, it could potentially resolve bug 16466 in an acceptable manner. Bug 14114 too?
*AbuseFilter could potentially resolve bug 16338 and bug 13599, depending on how one sets up the filters.
*AbuseFilter could maybe be modified to allow per-page exceptions (bug 12963)... something like an whitelist filter? Or you could mash that into the original filter, which goes back to the workflow problem.
*AbuseFilter's ccnorm() and/or rmspecials() would resolve the unicode problem (bug 12896) AFAICT -- though that should certainly be tested & verified.
*AbuseFilter's warn function would resolve bug 9416 in a very user-friendly manner.
In summation: AbuseFilter needs to implement global filters, local exemption, backward compatibility with SpamBlacklist on third-party installs, better filter tagging/searching and other workflow improvements before it can be considered a viable alternative to SpamBlacklist.