Last modified: 2014-09-23 23:32:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T13402, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 11402 - Deleting of pages with high number of revisions makes server cry
Deleting of pages with high number of revisions makes server cry
Status: NEW
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
All All
: Low normal with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-need-review, performance
Depends on: 18493
Blocks: 16012
  Show dependency treegraph
Reported: 2007-09-19 20:35 UTC by Carl Fürstenberg
Modified: 2014-09-23 23:32 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

Deletion overhaul to reduce overhead (54.58 KB, patch)
2008-10-30 21:55 UTC, Aaron Schulz
RevisionDelete fix (56.55 KB, patch)
2008-10-31 10:46 UTC, Aaron Schulz
More minor tweaks (58.21 KB, patch)
2008-10-31 11:08 UTC, Aaron Schulz
Patch 2.0 (59.12 KB, patch)
2008-11-06 03:26 UTC, Aaron Schulz
Cleanup ref contraints on PG (60.11 KB, patch)
2008-11-06 17:05 UTC, Aaron Schulz
Schema cleanup/undelete paging (61.82 KB, patch)
2008-11-08 17:53 UTC, Aaron Schulz

Description Carl Fürstenberg 2007-09-19 20:35:19 UTC
If a deletion is made on a page with a really high number of revisions (like [[w:en:Wikipedia:Sandbox]] for example) the server might stop responding for an indefinite (at lest a long) time, as the process deleting is filling up all resources.
Comment 1 Brion Vibber 2007-09-19 20:44:42 UTC
The 'quick fix' here would be a count-and-cutoff warning to discourage people from deleting the sandbox if it's not really needed.

A more serious fix would be to actually make such deletions faster...

Deleting a whole page with a lot of revisions is expensive because the revision records need to be moved around. Currently this means copying data from 'revision' to 'archive' table, then deleting the 'revision' rows. This second part can cause contention as other updates to the revision table get delayed by (?) index locks.

Using revision-flagged deletion might or might not help here -- you'd still have to update all those rows.

Some sort of queue where the page is locked at a higher level and the database is dealt with in pieces might make sense... or that might be awful... dunno yet. :)
Comment 2 Roan Kattouw 2008-05-18 19:59:14 UTC
(In reply to comment #1)
> Using revision-flagged deletion might or might not help here -- you'd still
> have to update all those rows.
Well I assume updating hundreds to thousands of rows is still a lot faster than inserting, then deleting the same number of rows: you're touching only half as many rows, you're not changing tables' sizes and the server doesn't have as much data to process (only one bitfield per row vs. the entire row)

> Some sort of queue where the page is locked at a higher level and the database
> is dealt with in pieces might make sense... or that might be awful... dunno
> yet. :)
I recall having suggested the job queue for this somewhere. My idea was that you first delete the page from the page table, thus making it appear to be deleted, and putting the rev_deleted UPDATE queries (the same query a lot of times, with a sensible LIMIT) in the job queue. The only downside is that oldid and diff links will be accessible for a while after the page's deletion.
Comment 3 Aaron Schulz 2008-10-30 21:55:45 UTC
Created attachment 5493 [details]
Deletion overhaul to reduce overhead
Comment 4 Aaron Schulz 2008-10-31 10:46:20 UTC
Created attachment 5494 [details]
RevisionDelete fix
Comment 5 Aaron Schulz 2008-10-31 11:08:17 UTC
Created attachment 5495 [details]
More minor tweaks
Comment 6 Brion Vibber 2008-11-06 02:45:31 UTC
This seems a bit overengineered. :)

The bad:

1) It leaves us with three separate ways in which something might be "deleted":

* archive/text
* page/revision/text with rev_deleted 
* deleted_page/revision/text

plus deleted revisions for files.

None of them seem to integrate very cleanly with one another. Leaving things in the 'revision' table seems like it will look pretty weird, and there seem to be weird hackarounds for the referential integrity checks in the postgres mode... it just feels a bit icky. :)

2) Special:Restore looks like yet another interface that.... mostly duplicates Special:Undelete? Definitely should integrate these more...

The thing I do kind of like is the idea that distinct "deletion events" could be stored, and then looked at, separately -- each "incarnation" of a page which is created, then deleted can be stored in the table with its separate page ID and revision history. In theory it'd be nice to be _able_ to merge them back together, of course...

Also we have some funky hacks to prevent restoration of pages with restricted view as the top revision. Honestly we shouldn't have to special case things like that... we should make sure that's cleaned up, if there's still a reason for it.

There also seems to be duplication between this Title::isValidRestoreTarget() and existing things like isValidMoveTarget()... they look pretty much the same and should probably be sharing code.

So some general thoughts...

* Tone down the UI code sprawl -- we should have a single clear interface for finding and managing deleted items.

* Ensure there's clean separation between UI and backend code. I see a lot of stuff mixed together in Special:Restore and it's hard to tell what's going on.

* Try to clean up the referential integrity issues -- either some nice clean way that doesn't make things get deleted, or just remove the foreign keys if they're not valid... but it'd be nice to have validity checks if they can apply!
Comment 7 Aaron Schulz 2008-11-06 02:54:01 UTC
'archive/text' stuff is there for B/C, not for new deletions. I don't mind leaving stuff in 'revision', plus, it is needed to kill the o(N) overhead.

There is no 'Special:Restore' page on the wiki. Only 'Special:Undelete'. The old class is kept for B/C and the fact that all of the queries there are different as well as a good deal of formatting (merging would be uglier than leaving separate).

Merging of different incarnations (when rarely needed) can be done done via MergeHistory.

Not sure what you mean about cleaning up top revs ;)

Some common function for isValidMoveTarget() and stuff can prolly eliminate some duplication.
Comment 8 Aaron Schulz 2008-11-06 03:26:58 UTC
Created attachment 5504 [details]
Patch 2.0

Removed some title.php duplication. Separated out some db logic from UI.
Comment 9 Aaron Schulz 2008-11-06 16:19:06 UTC
I'll ask greg about removing those ref SQL clauses
Comment 10 Aaron Schulz 2008-11-06 17:05:56 UTC
Created attachment 5505 [details]
Cleanup ref contraints on PG
Comment 11 Aaron Schulz 2008-11-08 17:53:41 UTC
Created attachment 5510 [details]
Schema cleanup/undelete paging
Comment 12 p858snake 2011-04-30 00:09:42 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 13 Sumana Harihareswara 2011-09-30 15:28:25 UTC
Aaron, you could now update your patch & commit it yourself, if it's still good & needed. :-)
Comment 14 Rob Lanphier 2012-01-06 19:27:34 UTC
This is a low priority item, we think.  I'm unassigning from Aaron as he hasn't worked on this in a while, and won't have time to update the patch in the near future.
Comment 15 Andre Klapper 2013-01-04 15:56:16 UTC
*** Bug 40174 has been marked as a duplicate of this bug. ***
Comment 16 Nathan Larson 2013-12-26 20:42:46 UTC
I'm working on a prototype that implements the "new table" option. I got away from the "archive" terminology altogether and named the new table "deleted_page".

If we keep revision.rev_page the same when the page is deleted and then undeleted, that works fine, as long as *all* the revisions are undeleted. However, suppose we only want to undelete one of the revisions for page 1. That won't work because as soon as we delete the deleted_page row for page 1, and add the page row for page 1, then all the revisions with rev_page 1 will be restored. MediaWiki won't be able to tell them apart (although it would work if page 1 were deleted, then recreated as page 2, and then some of the revisions from the still-deleted page 1 were restored into page 2, because then rev_page on the still-deleted revisions would still be 1, and there would still be a page_deleted row for page 1.)

There needs to be some way of differentiating deleted revisions from undeleted revisions with the same rev_page. Should revision.rev_deleted be set to 7 for those, or should we have a new constant? E.g., should we set const DELETED_PAGE = 16; in Revision.php and use that to indicate that the whole page was deleted and only some deletions were restored?

Setting rev_deleted to 7 seems more aligned with the goals of backward compatibility and merging the deletion and revisiondeletion systems.
Comment 17 Nathan Larson 2013-12-27 02:16:27 UTC
See cases #3, #4, and #5 at bug 58986 for examples of why revision.rev_page may sometimes need to be changed after this system is implemented.

Note You need to log in before you can comment on or make changes to this bug.