Last modified: 2014-06-21 19:58:13 UTC
Special:Contributions should allow selecting of concrete page so one can see all of some user's edits on such page. Good for pages with non-trivial history.
I'm committing a schema change for this now. It will require a new index on the revision table, so it probably won't be live for quite some time even once the code to do it is written.
*** Bug 16745 has been marked as a duplicate of this bug. ***
(In reply to comment #1) > I'm committing a schema change for this now. It will require a new index on > the revision table, so it probably won't be live for quite some time even once > the code to do it is written. > Removing schema-change keyword on the assumption that it's already happened: the comment is from 2007 and the API manages to do this just fine (prop=revisions&rvuser=Catrope&titles=Main_Page).
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main_Page&rvlimit=500&rvuser=MZMcBride You are looking at the HTML representation of the XML format. HTML is good for debugging, but probably is not suitable for your application. See complete documentation, or API help for more information. <?xml version="1.0"?> <api> <query> <normalized> <n from="Main_Page" to="Main Page" /> </normalized> <pages> <page pageid="15580374" ns="0" title="Main Page"> <revisions> <rev revid="250138233" user="MZMcBride" timestamp="2008-11-07T00:25:19Z" comment="cleaned up code, per talk" /> <rev revid="210742813" user="MZMcBride" timestamp="2008-05-07T05:22:48Z" comment="rv" /> <rev revid="210742552" user="MZMcBride" timestamp="2008-05-07T05:20:55Z" comment="restored margin-top with lower value" /> <rev revid="210742376" user="MZMcBride" timestamp="2008-05-07T05:19:46Z" comment="rm margin-top, see talk" /> <rev revid="199549748" user="MZMcBride" timestamp="2008-03-20T08:11:25Z" comment="rv" /> <rev revid="197643346" user="MZMcBride" timestamp="2008-03-12T03:52:20Z" comment="rm unneeded div" /> <rev revid="191953558" user="MZMcBride" timestamp="2008-02-16T23:45:51Z" comment="bypassed redirect" /> </revisions> </page> </pages> </query> </api> -------------------------------------------------------------------------------- Above is an example use. I'd like a different page that lets me display user contribution (individual revisions) like page history. Basically I want *the diff link to be clickable *timestamp displayed in standard dating format (depending on users pref like how history date & time is displayed) *possibly adding a text box to history pages so that I can "search" a users contribution. **For example if I went to page history of "Main Page" and search for "MZMcBride" I'd get the above feed
*** Bug 24345 has been marked as a duplicate of this bug. ***
So all that's missing for this now is hacking up SpecialContributions.php? Is that right? If so, this can probably be marked "easy", right? Doing so.
(In reply to comment #1) > I'm committing a schema change for this now. It will require a new index on > the revision table, so it probably won't be live for quite some time even once > the code to do it is written. This was committed in r25267.
(In reply to comment #3) > (In reply to comment #1) >> I'm committing a schema change for this now. It will require a new index on >> the revision table, so it probably won't be live for quite some time even once >> the code to do it is written. >> > > Removing schema-change keyword on the assumption that it's already happened: > the comment is from 2007 and the API manages to do this just fine > (prop=revisions&rvuser=Catrope&titles=Main_Page). Re-adding "schema-change" keyword and adding "shell" keyword (someone has to apply the change, after all). Also targeting this for 1.20 and bumping importance a bit, per my reasoning at bug 2667 comment 4. The assumption here is faulty, I think. Or at least this is easily answerable. I asked in #wikimedia-tech if the index "INDEX page_user_timestamp (rev_page,rev_user,rev_timestamp)" was on the revision table. DaB. said that the Toolserver didn't have this index, so it was likely that the masters don't either. Based on this, I'd like hard evidence to be able to say that a schema change isn't necessary here still.
(In reply to comment #7) > (In reply to comment #1) >> I'm committing a schema change for this now. It will require a new index on >> the revision table, so it probably won't be live for quite some time even once >> the code to do it is written. > > This was committed in r25267. And reverted in r25290, apparently. I guess I'll remove a few keywords now. From a coding point-of-view, this is (still) a pretty easy bug to resolve, it appears. So we'll keep the "easy" keyword.
r113109, r113110, r113111 brings this back into life It'll get added to WMF later in the 1.20 cycle, maybe before deployment Are there other
(In reply to comment #10) > r113109, r113110, r113111 brings this back into life > > It'll get added to WMF later in the 1.20 cycle, maybe before deployment > > Are there other Comment 0 was about getting Special:Contributions to do this. I don't see any commits to that part of the code. (You just need a page title input on Special:Contributions/user. It should be trivial.) Should this bug be re-opened or should a separate ticket be filed?
Reopening per comment 11. Database index supporting this filter has been added. But interface (HistoryAction, SpecialContributions and possibly some API modules as well) still needs to be done.
Indeed, per my IRC comment straight afterwards wondering why I actually closed it, and my cut off question asking what else needs doing
TODO: (In reply to comment #12) > Reopening per comment 11. > > Database index supporting this filter has been added. But interface > (HistoryAction, SpecialContributions and possibly some API modules as well) > still needs to be done.
I don't think we currently need the new revision index in order to support user edits per page queries. From testing queries like: select * from revision where rev_page = 1952670 and rev_user_text = "HBC AIV helperbot7" order by rev_timestamp desc limit 50; where that page id = http://en.wikipedia.org/wiki/Wikipedia:Administrator_intervention_against_vandalism with > 800k edits and that user has > 122k edits and it ran in ~30ms on a prod enwiki db. I got similar times against frequent editors of the India article, and when querying both with non-existent rev_user_text values.
*** Bug 42217 has been marked as a duplicate of this bug. ***
What's going on with this bug?
Okay, so it looks like tables.sql in master has the relevant index: CREATE INDEX /*i*/page_user_timestamp ON /*_*/revision (rev_page,rev_user,rev_timestamp); The question becomes whether this index has been applied to Wikimedia wikis or whether it needs to be (cf. comment 15), then?
> The question becomes whether this index has been applied to Wikimedia wikis or > whether it needs to be (cf. comment 15), then? mysql:wikiadmin@db63 [enwiki]> show indexes from revision\G *************************** 1. row *************************** Table: revision Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: rev_page Collation: A Cardinality: 100127178 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 2. row *************************** Table: revision Non_unique: 0 Key_name: PRIMARY Seq_in_index: 2 Column_name: rev_id Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 3. row *************************** Table: revision Non_unique: 0 Key_name: rev_id Seq_in_index: 1 Column_name: rev_id Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 4. row *************************** Table: revision Non_unique: 1 Key_name: rev_timestamp Seq_in_index: 1 Column_name: rev_timestamp Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 5. row *************************** Table: revision Non_unique: 1 Key_name: page_timestamp Seq_in_index: 1 Column_name: rev_page Collation: A Cardinality: 62579486 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 6. row *************************** Table: revision Non_unique: 1 Key_name: page_timestamp Seq_in_index: 2 Column_name: rev_timestamp Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 7. row *************************** Table: revision Non_unique: 1 Key_name: user_timestamp Seq_in_index: 1 Column_name: rev_user Collation: A Cardinality: 27813105 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 8. row *************************** Table: revision Non_unique: 1 Key_name: user_timestamp Seq_in_index: 2 Column_name: rev_timestamp Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 9. row *************************** Table: revision Non_unique: 1 Key_name: usertext_timestamp Seq_in_index: 1 Column_name: rev_user_text Collation: A Cardinality: 1053970 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 10. row *************************** Table: revision Non_unique: 1 Key_name: usertext_timestamp Seq_in_index: 2 Column_name: rev_timestamp Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 11. row *************************** Table: revision Non_unique: 1 Key_name: usertext_timestamp Seq_in_index: 3 Column_name: rev_user Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 12. row *************************** Table: revision Non_unique: 1 Key_name: usertext_timestamp Seq_in_index: 4 Column_name: rev_deleted Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 13. row *************************** Table: revision Non_unique: 1 Key_name: usertext_timestamp Seq_in_index: 5 Column_name: rev_minor_edit Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 14. row *************************** Table: revision Non_unique: 1 Key_name: usertext_timestamp Seq_in_index: 6 Column_name: rev_text_id Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: *************************** 15. row *************************** Table: revision Non_unique: 1 Key_name: usertext_timestamp Seq_in_index: 7 Column_name: rev_comment Collation: A Cardinality: 500635893 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE Comment: 15 rows in set (0.27 sec)
tl;dr: the page_user_timestamp index is _not_ on the Wikimedia databases. Asher: are you comfortable with user interface exposure of this functionality without this index?
(In reply to comment #20) > tl;dr: the page_user_timestamp index is _not_ on the Wikimedia databases. > > Asher: are you comfortable with user interface exposure of this functionality > without this index? Asher: ping?
Yes, I am so long as the interface doesn't provide pagination based on limit+offset queries. mysql:root@db1051 [enwiki]> select count(*) from revision where rev_page = 1952670; +----------+ | count(*) | +----------+ | 941113 | +----------+ 1 row in set (0.68 sec) mysql:root@db1051 [enwiki]> select count(1) as count, rev_user from revision where rev_page = 1952670 group by rev_user order by count desc limit 1; +--------+----------+ | count | rev_user | +--------+----------+ | 139166 | 6327251 | +--------+----------+ mysql:root@db1051 [enwiki]> explain select * from revision where rev_page = 1952670 and rev_user = 6327251 order by rev_timestamp desc limit 25; +------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+ | 1 | SIMPLE | revision | ref | PRIMARY,page_timestamp,user_timestamp | page_timestamp | 4 | const | 2707632 | Using where | +------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+ 1 row in set (0.03 sec) 2707632 looks bad, but: mysql:root@db1051 [enwiki]> flush status; mysql:root@db1051 [enwiki]> select * from revision where rev_page = 1952670 and rev_user = 6327251 order by rev_timestamp desc limit 25; .... mysql:root@db1051 [enwiki]> show status like 'Handler_read%'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 1 | | Handler_read_last | 0 | | Handler_read_next | 0 | | Handler_read_prev | 151 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 0 | +--------------------------+-------+ 8 rows in set (0.02 sec) That example is for the most edited page and the user with the most edits. The worst case for the current schema would be a user with only one edit of the most edited page. mysql:root@db1051 [enwiki]> explain select * from revision where rev_page = 1952670 and rev_user = 4305640 order by rev_timestamp desc limit 25; +------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+ | 1 | SIMPLE | revision | ref | PRIMARY,page_timestamp,user_timestamp | user_timestamp | 4 | const | 479 | Using where | +------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+ 1 row in set (0.03 sec) mysql:root@db1051 [enwiki]> flush status; Query OK, 0 rows affected (0.03 sec) mysql:root@db1051 [enwiki]> select * from revision where rev_page = 1952670 and rev_user = 4305640 order by rev_timestamp desc limit 25; +-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+ | rev_id | rev_page | rev_text_id | rev_comment | rev_user | rev_user_text | rev_timestamp | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1 | +-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+ | 134864881 | 1952670 | 134137447 | /* User-reported */ | 4305640 | AM01NU06 | 20070531180220 | 0 | 0 | 1905 | 134864494 | l2ac4niowr2u88gl84ii3vz9cbrkc6t | +-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+ 1 row in set (0.13 sec) mysql:root@db1051 [enwiki]> show status like 'Handler_read%'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 1 | | Handler_read_last | 0 | | Handler_read_next | 0 | | Handler_read_prev | 480 | | Handler_read_rnd | 0 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 0 | +--------------------------+-------+ 8 rows in set (0.03 sec) 130ms is a little sucky but for a worst case, this should still be ok. The nice thing is that hopefully we'll be able to use extended_keys in MariaDB 5.5.31. Everything is currently migrating to 5.5.30 which has an optimizer bug that I reported regarding extended_keys so I have it disabled in production for now. It allows the primary key to be fully utilized as the right side member of every secondary key. The revision primary key is: PRIMARY KEY (`rev_page`,`rev_id`), and we have in production: KEY `user_timestamp` (`rev_user`,`rev_timestamp`), So with extended_keys, rev_page will be useable without having to increase the index side by duplicating as with the proposed page_user_timestamp.
Note that the revision PRIMARY KEY is just rev_id for new tables for MediaWiki. No one ever got around to changing existing ones already.
Unassigning from Reedy since he's not working on it.
Removing target milestone that was in the past. If you want this in a specific release, have a good reason AND you are willing to find resources to fix this bug, feel free to change it to something appropriate.