Last modified: 2014-09-22 23:23:07 UTC
See also: * bug 12394 (make it easier to retrieve rcid of a newpage creation) * bug 15552 (things like Special:RecentChangesLinked and in order to use a generator module with recentchanges) This would enable: * Getting rcid of an unpatrolled page creation from the API. * Use recentchanges with a generator module. * Use recentchanges with watch list (maybe?). * Allow tools to mass-patrol all unpatrolled changes to a single page. (bug 12394 comment #7) > (bug 12394 comment #6) > > (bug 12394 comment #5) > > > However, I see in the current database structure there's a seperate column for > > > rc_title. > > > I'm not sure since when this exists, and/or if it was originally utilized, but > > > when using that in the query (AND WHERE rc_title='Foobar') it'd be like any > > > other condition currently in the recentchanges API right (same thing for > > > rc_user, with $this->addWhereFld(); ) > > > > > The original implementation did do a WHERE on rc_namespace and rc_title, yes, > > but that's not the same as doing a WHERE on rc_user because the latter is > > indexed. Implementing this feature would require adding an index for it > > (kinda > > leery of that) and even then it'd have to sort by namespace, then title, then > > timestamp in order to work efficiently. > > There is a rc_namespace_title index though. But unlike the one for rc_user, > it > doesn't have rc_timestamp. > > mediawiki-core@master:/maintenance/tables.sql: > INDEX rc_timestamp ON recentchanges (rc_timestamp); > INDEX rc_namespace_title ON recentchanges (rc_namespace, rc_title); > INDEX rc_cur_id ON recentchanges (rc_cur_id); > INDEX new_name_timestamp ON recentchanges > (rc_new,rc_namespace,rc_timestamp); > INDEX rc_ip ON recentchanges (rc_ip); > INDEX rc_ns_usertext ON recentchanges (rc_namespace, rc_user_text); > INDEX rc_user_text ON recentchanges (rc_user_text, rc_timestamp); > > > Would it make sense for rc_namespace_title to have it? I wonder what it is > used > for and if those uses would have a problem with the extra rc_timestamp sort. > The default sort for rc_namespace_title is presumably rc_id which should have > be very close to the sort order of rc_timestamp. > > The main reason it needs rc_timestamp is not for the sort order, but to be > able > to do rcstart and rcend.
@Roan: Are you really sure we need more indices for this? What about watchlist. Watchlist uses the same recentchanges table with a major INNER JOIN on watchlist (as opposed to just a few "WHERE rc_title IN ( ..)"). Watchlist also does the rc_timestamp sort and range. Of course, it is possible that this just proves the opposite (watchlist is also slow, but we can't remove it, and we can remove rctitles from the API to avoid it from getting worse). Is that the case?
(In reply to comment #0) > * Use recentchanges with a generator module. To use it with a generator you'd need to use "titles" rather than "rctitles", and you'd probably also want to move the module to "prop=recentchanges" instead of "list=recentchanges" (breaking BC). At that point, adding a "prop=recentchanges" (which might need a different name) might make more sense. (In reply to comment #1) > Of course, it is possible that this just proves the opposite (watchlist is > also slow, but we can't remove it, and we can remove rctitles from the API > to avoid it from getting worse). I think that's a good possibility. The query used for the watchlist filesorts, which is something we try to avoid in the API for performance reasons. Is this why we have a special database query group just for the watchlist?
Also, in addition to Watchlist, there is also RecentchangesLinked which also essentially does this already.
(In reply to comment #2) > (In reply to comment #0) > > * Use recentchanges with a generator module. > > To use it with a generator you'd need to use "titles" rather than "rctitles", > and you'd probably also want to move the module to "prop=recentchanges" > instead of "list=recentchanges".. Hm.. right, our generator model requires a separate module if the titles filter is optional (e.g. prop=categories requires titles=, so you can use a generator module as input, but prop=categories itself can also be a generator to another module). Recentchanges is already a generator module (it can be a provider of titles to another query module), but can't yet take titles= or a generator as input. And since we must keep titles= as an optional param, the filtered version of it has to be a (separate) prop module.
Change 92654 had a related patch set uploaded by Krinkle: [WIP] recentchanges: Implement support for pagesets https://gerrit.wikimedia.org/r/92654
@Brad: Using a prop module might make sense, but the problem there is usability. From the application's perspective, title is an arbitrary filter, just like 'user', 'namespace' or 'tag'. It would be a pain if one would have to radically change the query just to use the title filter. However if the indexes support it, we might as well do both. I don't see why we wouldn't implement a 'rctitles'. I'm rephrasing the bug back to how it was. In this case the main purpose (and why I filed the bug) is to be able to filter by title on the recentchanges query. Another problem is that (afaik) prop wouldn't work on pages that have been renamed or wear a different title now. e.g. it wouldn't be querying rc_title/rc_namespace but page_namespace/page_title/page_id, and rc_namespace/rc_title, which is not going to work as it will miss edits made before and between renames of a page. We'd have to map it to rc_cur_id inside the prop query. Anyway, builing a prop module for getting recent changes made to one or more pages is nice, but not what this bug is asking for (plus, it'd need an extra index).
Change 92654 abandoned by Krinkle: [WIP] recentchanges: Implement support for pagesets Reason: Meh, no longer cool. Going to use rcstream instead with a live page filter on the client-side.. Can't wait for the new API to happen with the app I'm working on. https://gerrit.wikimedia.org/r/92654