Last modified: 2013-04-24 09:01:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 18247 - Allowing "per page" filters
Allowing "per page" filters
Product: MediaWiki extensions
Classification: Unclassified
AbuseFilter (Other open bugs)
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: 20147 (view as bug list)
Depends on: 18246
  Show dependency treegraph
Reported: 2009-03-30 02:52 UTC by FT2
Modified: 2013-04-24 09:01 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description FT2 2009-03-30 02:52:55 UTC
This request is to allow "per page" filters, similar to "edit notices", as opposed to filters with a /condition/ "PAGE == [[...]]".


I am considering the uses of AbuseFilter, once we get to grips with it and it's been in place a while. The filter extension has immense potential for "per article" filtering, where a filter could act to inhibit very specific kinds of vandalism and warring entries from a specific article, or warn about re-addition of specific problem content (eg to a BLP), and the like. many, many articles might benefit from this. It may be that there are "project-wide" filters, and as well, many articles have their own custom filter. There are surely well-known vandalisms or problem edits (eg on evolution, BLPs, nationalism, "stuff in the news") where AbuseFilter could actively be used to detect attempts to reinstate the problem text, and be a very major tool against vandalism. 

Per-article filters could therefore replace protection or blocking on many topics over time, by inhibiting attempts to post the known problem edits. 

For example if a newspaper wrote that "X may be gay" or "Y may have lied" about some individual, a filter specific to that biography or article could be created to warn about this issue and direct users to discussion on the talk page, or even block its addition, if and only if a user tried to add that specific matter to the article. If a bad source is being repeatedly added against consensus using socks or dynamic IPs, a filter could be created to prevent the addition, explain why, link to the discussion, and warn the user not to re-add it without consensus. If it was one (banned) edit warrior, the filter could even be configured (given consensus) to automatically block the account as a ban evasion account, on attempting to add that source.

Many articles could benefit from something of this capability. They might be designated by means of a "Requests for Filter" process, in which a filter to warn or prevent a specific kind of abuse on that article is designed and active for a period of time, following evidence of a type of persistent abuse (see also: bug 18246 "Expiry time option on filters"). There are many kinds of abuse that would benefit from this.

At present, all filters affect an entire project. There is an "if page=X" option but for various reasons it's less than ideal. Each filter must be checked, for each edit, leading to an upper limit on the number of checks (and hence filters) in total. "Per article" filters get round that because AbuseFilter would only check the general project filters and specific filters saved against that article, ignoring competely any filter settings that might exist against other articles. 

Not running these filters on every edit (where it isn't relevant to that article) by having some filters indexed with the article itself, would 1/ be much smarter, and 2/ therefore allow much wider use of filters for abuse on many individual articles and topics without server lag.

In essence, I'm envisaging the potential to use abuse filter widely. We might end up with a few tens or even hundreds of thousands of articles with custom filters on them, much like we have many articles with protection or the like. An article becomes subject to a given type of abuse? Post a [[WP:Request for Filter]] and if agreed and feasible, a filter will be set up to prevent it for a limited period. Obviously good (bug-free) filter design would be essential, but we'll gain experience at filter design, and this would be an immensely capable anti-vandalism tool. I think it's got enough potential that once we get the hang of managing wiki-wide filters, article-specific filters will be immensely useful :)
Comment 1 FT2 2009-04-01 17:48:41 UTC
Examples where this is already being used, and could be extended and optimized by "per page" filters: 

eg, 36, 100, 101

Rather than have many filters which apply to one article only (but are included in general filters) it makes sense to have a table of those filters that are page specific and only trigger on one (or a few) specific article title matches, because any number of these can be created and applied without significant server lag, since almost all of them will be completely skipped if not on an applicable page.
Comment 2 Mike.lifeguard 2009-04-01 19:00:42 UTC
Shall we also have per-namespace filters? Per-action? I imagine any performance benefit there wouldn't outweigh the added complexity. Indeed, I'm not sure the added complexity is even worth per-page filters... if the page requirement is the first check in the filter, then it should short-circuit - that first check would probably be the same cost regardless whether it's done in the filter as now or done in some other way.
Comment 3 FT2 2009-04-01 22:40:14 UTC
Disagree, though I take the point you're making. The difference is this:

A bare handful of days into AbuseFilter, we have filters for specific vandals and/or specific articles or "modus". 

Imagine a scenario where we have not 3 or 4 of these, but many thousands of them. Suppose there are 200 global filters, and 10,000 article-based filters of which 3 apply to this article being edited.

(This many filters need not be a "control problem" provided that all filters are set up following on-wiki review as with other issues. It could well be the future of first-line "repeat abuse detection and handling" once we get the hang of using filters effectively.)

Under the present system, the edit would get matched to all 10,200 filters, HOWEVER OPTIMIZED. Even if the "article match" were checked very first thing, the edit is still tested against each filter, however briefly.

The proposal here would skip that. The edit is tested not against 10,200 filters, but only 203 (200 global filters and the 3 article-specific filters that are indexed/applicable to that article). The other 9997 filters are simply not seen, because they are indexed/attached to other articles.
Comment 4 FT2 2009-04-01 22:51:27 UTC
In brief, every filter would have an extra field of the form "article_name must match..." <parameter_list>

where <parameter_list> is one or more page titles with an appropriate separator such as the pipe symbol.

Then edits to any article would be tested only against those filters selected via pseudocode:

SELECT FilterID.* FROM Filters WHERE ((filterID.parameter IS Null OR <article_name> IS IN filterID.parameter_list) AND filterID.enabled IS True)

Computationally that's trivial, and also an immense time-saver. 

Hope that makes sense :)
Comment 5 Andrew Garrett 2009-06-03 15:16:57 UTC
Minor technical quibble: It isn't computationally trivial to search for a substring in thousands of rows. It requires an expensive row scan. If implemented, it would be done differently.
Comment 6 FT2 2009-06-05 16:06:46 UTC
A substring search should not be necessary. The table of "per article filters" would be indexed by pageID, allowing a direct lookup of all filters applicable to a given page (ie those with the specified PageID, or those applicable to all pages).
Comment 7 Andrew Garrett 2009-06-05 16:11:27 UTC
(In reply to comment #6)
> A substring search should not be necessary. The table of "per article filters"
> would be indexed by pageID, allowing a direct lookup of all filters applicable
> to a given page (ie those with the specified PageID, or those applicable to all
> pages).

That isn't how you suggested storing it, you suggested putting it in a pipe-separated list.

If implemented, there would be a 'global' flag on the filters themselves, and a separate table for associating filters with pages.
Comment 8 Andrew Garrett 2009-07-16 16:47:50 UTC
Marking this bug as Lowest priority.

I've done this in a batch to (usually enhancement request) bugs where:
* It is not clear that this bug should be fixed.
* It is not clear how to fix this bug.
* There are difficulties or complications in fixing this bug, which are not justified by the importance of the bug.
* This is an extremely minor bug that could not be fixed in a few lines of code.

If you're interested in having one of these bugs fixed, your best bet is to write the patch yourself.
Comment 9 Andrew Garrett 2009-08-09 20:25:42 UTC
*** Bug 20147 has been marked as a duplicate of this bug. ***
Comment 10 John Mark Vandenberg 2011-04-28 16:16:55 UTC
We already have article_articleid & article_text.  This bug is requesting optimisation of those variables specifically without giving evidence it is needed or cost efficient (another table lookup vs putting the article_articleid condition first and/or improving the execute plan).

Note You need to log in before you can comment on or make changes to this bug.