Last modified: 2012-10-29 16:39:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T22186, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 20186 - Allow filtering history and contributions lists for revision deletion
Allow filtering history and contributions lists for revision deletion
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
History/Diffs (Other open bugs)
unspecified
All All
: Normal major with 3 votes (vote)
: ---
Assigned To: Church of emacs
:
Depends on:
Blocks: SWMT 18780 19697 19819
  Show dependency treegraph
 
Reported: 2009-08-11 22:58 UTC by DerHexer
Modified: 2012-10-29 16:39 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description DerHexer 2009-08-11 22:58:19 UTC
Could you please make the articles' history sortable for these revisions which were deleted by "Revision delete"? With the old deletion system it is possible to see those revisions alltogether which were deleted (e. g. because of copyvios); and these alltogether on action=history which weren't deleted. When a revision is deleted with "Revision delete" it cannot be distinguished from those which are still visible because they are both mentioned in action=history. Using revision deletion on pages with some thousand revisions (e. g. [[:w:WP:AIV]], [[:de:WP:VM]], copyvios on huge articles) makes it impossible to see/check all deleted revisions together. If a sortable history (deleted revision vs. visible revisions) is not possible, another possibility to differ should be provided.

Kind regards
DerHexer
Comment 1 DerHexer 2009-08-12 00:09:39 UTC
The same thing happens with Special:Contributions and Special:DeletedContributions. With "Revision delete" there's no way to filter one's user's (with "Revision delete") deleted revisions and visible revisions; they will all be accessible on Special:Contributions. Users can have millions of revisions; if there's no way to filter "Revision deleted" revisions they will be lost in the contributions' list.

Kind regards
DerHexer
Comment 2 Brion Vibber 2009-08-13 00:46:06 UTC
Clarifying summary.
Comment 3 DerHexer 2009-08-13 09:43:29 UTC
I could also imagine separate logs for those deleted and suppressed revisions, or accessibility on Special:Undelete resp. Special:DeletedContributions (beside being available on the contributions/history lists with all non-deleted/non-suppressed revisions).
Comment 4 Church of emacs 2009-10-25 23:44:30 UTC
I've written a piece of software that allows filtering in page history. I'm not so happy about it, though. I added an URL parameter deleted=1 and a checkbox in the history form. The checkbox is _always_ displayed, even though (a) RevisionDelete is currently still disabled by default, (b) it is very likely that the page you are visiting has no deleted revisions, as RevisionDelete is used rarely. I was thinking about a permission check (check if the current user is allowed to delete revisions) for showing the checkbox, but that isn't really a good solution (deleted revisions might be interesting to other users as well). It'd be elegant to only show the link if there actually are deleted revisions, but for that all revisions of a page have to be checked, which is a no-go.
If you have any suggestions or solutions, please share them :)
Comment 5 Church of emacs 2009-10-26 18:52:21 UTC
Done for page history in r58153. Note that there are two restrictions:
* No big pages. We check for isBigDeletion, which checks for $wgDeleteRevisionsLimit (5000 on WMF wikis). On larger histories, it takes a lot of time to examine all revisions for rev_deleted.
* Only to users with 'deleterevision' right. We are paranoid (requesting this operation on a page with 4999 revisions might be a starting point for a DDoS), plus only users with 'deleterevision' right can actually check the content of the deleted revisions.
Leaving as non-resolved, because contribution filtering still has to be implemented.
Comment 6 FT2 2009-12-01 08:08:18 UTC
What about a paged interface (like ordinary history or contribs) for pages with many revisions or much RevDelete activity?

Is "WHERE (pageid=X AND bitflags > 0)" really that much slower or more demanding on the servers, than "WHERE (pageid=X)"?
Comment 7 Roan Kattouw 2009-12-01 13:09:05 UTC
(In reply to comment #6)
> What about a paged interface (like ordinary history or contribs) for pages with
> many revisions or much RevDelete activity?
> 
> Is "WHERE (pageid=X AND bitflags > 0)" really that much slower or more
> demanding on the servers, than "WHERE (pageid=X)"?
> 

You'd be doing WHERE rev_page=X AND rev_deleted != 0, and yes, that's slower, because there's no index on rev_deleted and rows with rev_deleted != 0 are relatively rare. What the DBMS will end up doing is:

* retrieve rows with rev_page=X using an index (fast)
* iterate over these rows
** if the row has rev_deleted !=0 (rare), add it to the result set. If the result set is full (50 or 500 results), stop
** if the row has rev_deleted = 0 (common), ignore it

This'll need to examine a lot of rows because deleted revisions are rare. On the other hand, queries with WHERE rev_deleted = 0 , while not indexed, are relatively fast because rev_deleted=0 matches most rows.
Comment 8 FT2 2009-12-01 13:28:59 UTC
Not a coder but... add an index for rev_deleted?
Comment 9 Church of emacs 2009-12-01 16:41:18 UTC
Platonides proposed to add an index for (rev_page,rev_deleted) which should enable us to filter faster: http://lists.wikimedia.org/pipermail/wikitech-l/2009-November/045976.html
Comment 10 Roan Kattouw 2009-12-01 16:46:04 UTC
(In reply to comment #8)
> Not a coder but... add an index for rev_deleted?
> 

An index on rev_deleted by itself wouldn't help much, because there are few possible values for it. Adding indexes on (rev_page, rev_deleted, rev_timestamp) and (rev_user_tetx, rev_deleted, rev_timestamp) or different permutations thereof could work, but it's a lot of accommodation for such a minor feature IMO.
Comment 11 Church of emacs 2009-12-01 16:54:08 UTC
It might seem like a minor feature, but it is currently one of few bugs blocking RevisionDelete for sysops on WMF wikis (bug 18780). RevisionDelete is a highly desired feature among admins, and they've been waiting for years now. Being able to filter for deleted revisions is needed and I don't see any other way than adding an index to the database.
Comment 12 FT2 2009-12-01 16:57:17 UTC
The value of a feature as "major" or "minor" needs judging by usage. 

This one's likely to have good (and quite possibly fairly constant or heavy)
usage. It's something that admins may well use a lot of the time or on an
ongoing basis. It'll pay off.

Looking up redactions/deletions is a usual part of many admin matters, and its
doubtful there'd be much change under RevDelete either.
Comment 13 FT2 2009-12-01 16:59:06 UTC
(Or 2nd option, if indexing doesn't help, can it be handled somehow via the queue, like other DB-intense and lengthy tasks, to prevent DDOS?)
Comment 14 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-12-09 22:01:12 UTC
See my comments at <http://www.mediawiki.org/wiki/Special:Code/MediaWiki/58153#c4735>.  I don't think this is necessarily too expensive to do on large pages if it's for sysops only.
Comment 15 Tim Starling 2009-12-23 08:45:11 UTC
The alternative to adding an index is to split the rev_deleted field out to a separate "revision_props" table. This table would presumably be far smaller than the revision table, since most revisions would not have any special properties. Then you could put indexes on the new table without adding many gigabytes to the core DB size. 

The table could even be added as a cache, similar to the links tables. Then the existing read code would continue to work, you'd just have to fix the write code.
Comment 16 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-12-23 21:14:32 UTC
Then you'd have to filesort, though, right?  Unless we cheat and order by the rev_id; that would work reliably enough, I'd think.
Comment 17 Tim Starling 2010-02-05 04:51:58 UTC
I don't think it would take MySQL very long to sort ~100 deleted revisions from a given page. Maybe if there were 100,000 deleted revisions per page, it would be a problem. But you could optimise that case by denormalising further: you could copy rev_timestamp into the revision_props table.
Comment 18 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-02-05 19:13:06 UTC
There will probably end up being pages with tens of thousands of deleted revisions somewhere.  How many times have peopled tried to delete [[Wikipedia:Sandbox]]?  But ordering by rev_id is probably okay -- we do it for next/previous revision links when viewing old revisions, don't we?
Comment 19 FT2 2010-02-06 00:41:56 UTC
See bug 18104 comment 7:  best not to assume that sorting by rev_id will be the same as sorting by time/date. In principle it should, and from July 2005 it apparently will, but revisions deleted prior to June 2005 that were undeleted at some point will have out-of-sequence rev_id's.
Comment 20 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-02-07 00:02:11 UTC
So will cases where two rows are inserted at about the same time, and the rev_timestamp computed for the first happens to be slightly later than the rev_timestamp computed for the other -- e.g., Apaches' clocks being different.  But the question is whether this assumption is violated often enough to merit an extra indexed column on a new table.  I doubt it will come up often enough to be worth the trouble, in this case.
Comment 21 Tim Starling 2010-02-07 00:35:30 UTC
Sorted lists of revisions, such as history and contributions, are always ordered by timestamp. This avoids misordering in cases such as undeletion and Special:Import. Yes, there may be misordering of rev_timestamp due to clock skew, but it will only be seconds, not years, so people will usually not notice it. And edits to the same paragraph are always correctly ordered using rev_timestamp since the edit conflict mechanism implicitly protects it.

Yes, Title::getNextRevisionID() uses rev_id, this is an unfortunate performance hack and will lead to inaccurate results due to import and undelete.
Comment 22 Church of emacs 2010-04-26 16:36:47 UTC
Okay, I did the same thing as in HistoryPage.php for SpecialContributions.php (r65546).
It seems to me that Tim's comment in r58153#c4843 is right: Filtering for RevisionDeleted edits is only as bad as filtering for namespaces. On a test wiki with 10k revisions, the query was surprisingly fast.

Please test this for performance before merging it in the WMF branch (see bug 23329 for a request on that)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links