Last modified: 2012-10-29 16:39:58 UTC
Could you please make the articles' history sortable for these revisions which were deleted by "Revision delete"? With the old deletion system it is possible to see those revisions alltogether which were deleted (e. g. because of copyvios); and these alltogether on action=history which weren't deleted. When a revision is deleted with "Revision delete" it cannot be distinguished from those which are still visible because they are both mentioned in action=history. Using revision deletion on pages with some thousand revisions (e. g. [[:w:WP:AIV]], [[:de:WP:VM]], copyvios on huge articles) makes it impossible to see/check all deleted revisions together. If a sortable history (deleted revision vs. visible revisions) is not possible, another possibility to differ should be provided. Kind regards DerHexer
The same thing happens with Special:Contributions and Special:DeletedContributions. With "Revision delete" there's no way to filter one's user's (with "Revision delete") deleted revisions and visible revisions; they will all be accessible on Special:Contributions. Users can have millions of revisions; if there's no way to filter "Revision deleted" revisions they will be lost in the contributions' list. Kind regards DerHexer
Clarifying summary.
I could also imagine separate logs for those deleted and suppressed revisions, or accessibility on Special:Undelete resp. Special:DeletedContributions (beside being available on the contributions/history lists with all non-deleted/non-suppressed revisions).
I've written a piece of software that allows filtering in page history. I'm not so happy about it, though. I added an URL parameter deleted=1 and a checkbox in the history form. The checkbox is _always_ displayed, even though (a) RevisionDelete is currently still disabled by default, (b) it is very likely that the page you are visiting has no deleted revisions, as RevisionDelete is used rarely. I was thinking about a permission check (check if the current user is allowed to delete revisions) for showing the checkbox, but that isn't really a good solution (deleted revisions might be interesting to other users as well). It'd be elegant to only show the link if there actually are deleted revisions, but for that all revisions of a page have to be checked, which is a no-go. If you have any suggestions or solutions, please share them :)
Done for page history in r58153. Note that there are two restrictions: * No big pages. We check for isBigDeletion, which checks for $wgDeleteRevisionsLimit (5000 on WMF wikis). On larger histories, it takes a lot of time to examine all revisions for rev_deleted. * Only to users with 'deleterevision' right. We are paranoid (requesting this operation on a page with 4999 revisions might be a starting point for a DDoS), plus only users with 'deleterevision' right can actually check the content of the deleted revisions. Leaving as non-resolved, because contribution filtering still has to be implemented.
What about a paged interface (like ordinary history or contribs) for pages with many revisions or much RevDelete activity? Is "WHERE (pageid=X AND bitflags > 0)" really that much slower or more demanding on the servers, than "WHERE (pageid=X)"?
(In reply to comment #6) > What about a paged interface (like ordinary history or contribs) for pages with > many revisions or much RevDelete activity? > > Is "WHERE (pageid=X AND bitflags > 0)" really that much slower or more > demanding on the servers, than "WHERE (pageid=X)"? > You'd be doing WHERE rev_page=X AND rev_deleted != 0, and yes, that's slower, because there's no index on rev_deleted and rows with rev_deleted != 0 are relatively rare. What the DBMS will end up doing is: * retrieve rows with rev_page=X using an index (fast) * iterate over these rows ** if the row has rev_deleted !=0 (rare), add it to the result set. If the result set is full (50 or 500 results), stop ** if the row has rev_deleted = 0 (common), ignore it This'll need to examine a lot of rows because deleted revisions are rare. On the other hand, queries with WHERE rev_deleted = 0 , while not indexed, are relatively fast because rev_deleted=0 matches most rows.
Not a coder but... add an index for rev_deleted?
Platonides proposed to add an index for (rev_page,rev_deleted) which should enable us to filter faster: http://lists.wikimedia.org/pipermail/wikitech-l/2009-November/045976.html
(In reply to comment #8) > Not a coder but... add an index for rev_deleted? > An index on rev_deleted by itself wouldn't help much, because there are few possible values for it. Adding indexes on (rev_page, rev_deleted, rev_timestamp) and (rev_user_tetx, rev_deleted, rev_timestamp) or different permutations thereof could work, but it's a lot of accommodation for such a minor feature IMO.
It might seem like a minor feature, but it is currently one of few bugs blocking RevisionDelete for sysops on WMF wikis (bug 18780). RevisionDelete is a highly desired feature among admins, and they've been waiting for years now. Being able to filter for deleted revisions is needed and I don't see any other way than adding an index to the database.
The value of a feature as "major" or "minor" needs judging by usage. This one's likely to have good (and quite possibly fairly constant or heavy) usage. It's something that admins may well use a lot of the time or on an ongoing basis. It'll pay off. Looking up redactions/deletions is a usual part of many admin matters, and its doubtful there'd be much change under RevDelete either.
(Or 2nd option, if indexing doesn't help, can it be handled somehow via the queue, like other DB-intense and lengthy tasks, to prevent DDOS?)
See my comments at <http://www.mediawiki.org/wiki/Special:Code/MediaWiki/58153#c4735>. I don't think this is necessarily too expensive to do on large pages if it's for sysops only.
The alternative to adding an index is to split the rev_deleted field out to a separate "revision_props" table. This table would presumably be far smaller than the revision table, since most revisions would not have any special properties. Then you could put indexes on the new table without adding many gigabytes to the core DB size. The table could even be added as a cache, similar to the links tables. Then the existing read code would continue to work, you'd just have to fix the write code.
Then you'd have to filesort, though, right? Unless we cheat and order by the rev_id; that would work reliably enough, I'd think.
I don't think it would take MySQL very long to sort ~100 deleted revisions from a given page. Maybe if there were 100,000 deleted revisions per page, it would be a problem. But you could optimise that case by denormalising further: you could copy rev_timestamp into the revision_props table.
There will probably end up being pages with tens of thousands of deleted revisions somewhere. How many times have peopled tried to delete [[Wikipedia:Sandbox]]? But ordering by rev_id is probably okay -- we do it for next/previous revision links when viewing old revisions, don't we?
See bug 18104 comment 7: best not to assume that sorting by rev_id will be the same as sorting by time/date. In principle it should, and from July 2005 it apparently will, but revisions deleted prior to June 2005 that were undeleted at some point will have out-of-sequence rev_id's.
So will cases where two rows are inserted at about the same time, and the rev_timestamp computed for the first happens to be slightly later than the rev_timestamp computed for the other -- e.g., Apaches' clocks being different. But the question is whether this assumption is violated often enough to merit an extra indexed column on a new table. I doubt it will come up often enough to be worth the trouble, in this case.
Sorted lists of revisions, such as history and contributions, are always ordered by timestamp. This avoids misordering in cases such as undeletion and Special:Import. Yes, there may be misordering of rev_timestamp due to clock skew, but it will only be seconds, not years, so people will usually not notice it. And edits to the same paragraph are always correctly ordered using rev_timestamp since the edit conflict mechanism implicitly protects it. Yes, Title::getNextRevisionID() uses rev_id, this is an unfortunate performance hack and will lead to inaccurate results due to import and undelete.
Okay, I did the same thing as in HistoryPage.php for SpecialContributions.php (r65546). It seems to me that Tim's comment in r58153#c4843 is right: Filtering for RevisionDeleted edits is only as bad as filtering for namespaces. On a test wiki with 10k revisions, the query was surprisingly fast. Please test this for performance before merging it in the WMF branch (see bug 23329 for a request on that)