Last modified: 2010-05-15 15:33:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3324, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 1324 - page2xml trouble
page2xml trouble
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
1.4.x
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-13 17:19 UTC by Jamesday
Modified: 2010-05-15 15:33 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jamesday 2005-01-13 17:19:57 UTC
At Wikimedia sites the following page2xml query is now being automatically
killed on sight, after use of it caused a problem on one of the database servers
handling de:

/* page2xml */  SELECT old_id as id,old_timestamp as timestamp,old_user as
user,old_user_text as user_text,old_comment as comment,old_text as
text,old_flags as flags  FROM `old`  WHERE old_namespace='4' AND
old_title='whatever'  ORDER BY old_timestamp 

This query asks for all versions of the article, at about one disk seek per
revision for the current schema, unless the revision is in cache. Worse, it ends
up using the old_title index, so it actually retrieves from disk all revisions
in all namespaces. At the time the server overload caused me to take a look
there were three of these running with run times of 891, 583 and 339 seconds.
The last was killed by the changed querybane rules after 805 seconds.

Adding use_index(name_title_timestamp) improves it significantly and should be
done but it still needs a limit added to it.

The planned schema change, if it stores old articles with article ID as the
first part of the primary key, would make this far more efficient because
adjsacent revisions would be in the same database pages. It would still
potentially retrieve hundreds of thousands of revisions for a popular article so
some limit is still going to be needed even with the new schama. If page2xml is
intended  to be a general retrieve an article call, that limit needs to be very
low - tens not hundreds of revisions.

For Wikimedia sites, limit 10 is a good choice at present, assuming the primary
purpose isn't retrieving the article history.
Comment 1 JeLuF 2005-01-15 08:00:09 UTC
enwiki doesn't have the old_title index, nor has a fresh REL1_4 install.
dewiki has this index.

Should old_title index be dropped? I'm not aware of any queries that use
old_title but not old_namespace. 
Comment 2 Jamesday 2005-01-15 08:22:30 UTC
The old_title index was in some of the wikis - de, fr, pl, probably others, but
not in the new ones I looked at. I'll remove it when practical, since it's not
present in the newer wikis. The query does select the correct index when that
one is removed. Not going to be enough to make it practical to return all old
metadata and text, though.
Comment 3 Jamesday 2005-11-12 23:22:45 UTC
Obsolete - old is gone now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links