Last modified: 2011-03-13 18:06:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T19033, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 17033 - API: Multiple page histories
API: Multiple page histories
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Lowest enhancement (vote)
: ---
Assigned To: Roan Kattouw
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-15 11:50 UTC by Gurch
Modified: 2011-03-13 18:06 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gurch 2009-01-15 11:50:32 UTC
I have a particular use case for the API that could be streamlined a little.

It would be useful to have basic info for the last few revisions to a page (say, 10 or so), for RC patrolling purposes, to help avoid the problem of people reverting to the wrong revision when trying to deal with vandalism.

This is easy to do with prop=revisions. At the moment, though, it can only be done one page at a time, unless I know the revision IDs beforehand, which I don't, and a separate page history request on top of the diff request for every change being reviewed is a little excessive, so I don't do it. (API diffs wouldn't help here as the diff query would work in such a way as to make combining it with a page history query impossible).

What I would like to be able to do is use something like the following query string:

action=query&titles=Foo|Bar|Baz&prop=revisions&rvlimit=10

and have it return the last 10 revisions for each of the pages Foo, Bar and Baz, i.e. 30 revisions in total. At the moment, of course, this doesn't work because rvlimit can only be used with a single page. Obviously the limit-checking code would have to be rewritten to check (rvlimit * title count) <= 500 rather than rvlimit <= 500. As far as I can tell, asking for X revisions this way should be no more of a performance hit than asking for X arbitrary revisions, which is already possible. I'd probably give it about five titles per query, so this would cut the number of queries needed to get the data by four-fifths.
Comment 1 Roan Kattouw 2009-01-15 12:03:26 UTC
(In reply to comment #0)
> I have a particular use case for the API that could be streamlined a little.
> 
> It would be useful to have basic info for the last few revisions to a page
> (say, 10 or so), for RC patrolling purposes, to help avoid the problem of
> people reverting to the wrong revision when trying to deal with vandalism.
> 
> This is easy to do with prop=revisions. At the moment, though, it can only be
> done one page at a time, unless I know the revision IDs beforehand, which I
> don't, and a separate page history request on top of the diff request for every
> change being reviewed is a little excessive, so I don't do it. (API diffs
> wouldn't help here as the diff query would work in such a way as to make
> combining it with a page history query impossible).
> 
> What I would like to be able to do is use something like the following query
> string:
> 
> action=query&titles=Foo|Bar|Baz&prop=revisions&rvlimit=10
> 
> and have it return the last 10 revisions for each of the pages Foo, Bar and
> Baz, i.e. 30 revisions in total. At the moment, of course, this doesn't work
> because rvlimit can only be used with a single page.
There's a good reason for that: there's no efficient way to get the first 10 revisions of Foo and the first 10 revisions of Bar in one database query. The closest approximation would be a query asking for the last 20 revisions of either Foo or Bar, but that could very well give you a 15/5 or 0/20 split, depending on the circumstances. The only way to ensure a 10/10 split is to run a separate query for each title, which is exactly what I don't want to do for performance reasons (as a general rule, database queries in loops are evil, especially if the number of iterations is controlled by the user).

> Obviously the
> limit-checking code would have to be rewritten to check (rvlimit * title count)
> <= 500 rather than rvlimit <= 500. As far as I can tell, asking for X revisions
> this way should be no more of a performance hit than asking for X arbitrary
> revisions, which is already possible.
Actually, it is, because asking for a number of arbitrary revisions based on revids can be done in one query.

> I'd probably give it about five titles
> per query, so this would cut the number of queries needed to get the data by
> four-fifths.
> 
Like I said above, it wouldn't.
Comment 2 Gurch 2009-01-15 12:59:48 UTC
(In reply to comment #1)
> There's a good reason for that: there's no efficient way to get the first 10
> revisions of Foo and the first 10 revisions of Bar in one database query.

Yeah, I see what you're getting at; never mind. (Perhaps I should stop trying to rewrite MediaWiki here... :/)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links