Last modified: 2010-05-15 15:33:08 UTC
At Wikimedia sites the following page2xml query is now being automatically killed on sight, after use of it caused a problem on one of the database servers handling de: /* page2xml */ SELECT old_id as id,old_timestamp as timestamp,old_user as user,old_user_text as user_text,old_comment as comment,old_text as text,old_flags as flags FROM `old` WHERE old_namespace='4' AND old_title='whatever' ORDER BY old_timestamp This query asks for all versions of the article, at about one disk seek per revision for the current schema, unless the revision is in cache. Worse, it ends up using the old_title index, so it actually retrieves from disk all revisions in all namespaces. At the time the server overload caused me to take a look there were three of these running with run times of 891, 583 and 339 seconds. The last was killed by the changed querybane rules after 805 seconds. Adding use_index(name_title_timestamp) improves it significantly and should be done but it still needs a limit added to it. The planned schema change, if it stores old articles with article ID as the first part of the primary key, would make this far more efficient because adjsacent revisions would be in the same database pages. It would still potentially retrieve hundreds of thousands of revisions for a popular article so some limit is still going to be needed even with the new schama. If page2xml is intended to be a general retrieve an article call, that limit needs to be very low - tens not hundreds of revisions. For Wikimedia sites, limit 10 is a good choice at present, assuming the primary purpose isn't retrieving the article history.
enwiki doesn't have the old_title index, nor has a fresh REL1_4 install. dewiki has this index. Should old_title index be dropped? I'm not aware of any queries that use old_title but not old_namespace.
The old_title index was in some of the wikis - de, fr, pl, probably others, but not in the new ones I looked at. I'll remove it when practical, since it's not present in the newer wikis. The query does select the correct index when that one is removed. Not going to be enough to make it practical to return all old metadata and text, though.
Obsolete - old is gone now.