Last modified: 2014-06-21 19:58:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 10788 - Filter page histories by user, or contributions by title
Filter page histories by user, or contributions by title
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.21.x
All All
: Normal enhancement with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 16745 24345 (view as bug list)
Depends on:
Blocks: 42217 49188
  Show dependency treegraph
 
Reported: 2007-08-03 15:43 UTC by Danny B.
Modified: 2014-06-21 19:58 UTC (History)
15 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Danny B. 2007-08-03 15:43:45 UTC
Special:Contributions should allow selecting of concrete page so one can see all of some user's edits on such page. Good for pages with non-trivial history.
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-08-29 02:24:31 UTC
I'm committing a schema change for this now.  It will require a new index on the revision table, so it probably won't be live for quite some time even once the code to do it is written.
Comment 2 P.Copp 2008-12-21 19:03:32 UTC
*** Bug 16745 has been marked as a duplicate of this bug. ***
Comment 3 Roan Kattouw 2008-12-21 19:37:19 UTC
(In reply to comment #1)
> I'm committing a schema change for this now.  It will require a new index on
> the revision table, so it probably won't be live for quite some time even once
> the code to do it is written.
> 

Removing schema-change keyword on the assumption that it's already happened: the comment is from 2007 and the API manages to do this just fine (prop=revisions&rvuser=Catrope&titles=Main_Page).
Comment 4 とある白い猫 2008-12-21 20:53:45 UTC
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main_Page&rvlimit=500&rvuser=MZMcBride

You are looking at the HTML representation of the XML format.
HTML is good for debugging, but probably is not suitable for your application.
See complete documentation, or API help for more information.
<?xml version="1.0"?>
<api>
  <query>
    <normalized>
      <n from="Main_Page" to="Main Page" />
    </normalized>
    <pages>
      <page pageid="15580374" ns="0" title="Main Page">
        <revisions>
          <rev revid="250138233" user="MZMcBride"
timestamp="2008-11-07T00:25:19Z" comment="cleaned up code, per talk" />
          <rev revid="210742813" user="MZMcBride"
timestamp="2008-05-07T05:22:48Z" comment="rv" />
          <rev revid="210742552" user="MZMcBride"
timestamp="2008-05-07T05:20:55Z" comment="restored margin-top with lower value"
/>
          <rev revid="210742376" user="MZMcBride"
timestamp="2008-05-07T05:19:46Z" comment="rm margin-top, see talk" />
          <rev revid="199549748" user="MZMcBride"
timestamp="2008-03-20T08:11:25Z" comment="rv" />
          <rev revid="197643346" user="MZMcBride"
timestamp="2008-03-12T03:52:20Z" comment="rm unneeded div" />
          <rev revid="191953558" user="MZMcBride"
timestamp="2008-02-16T23:45:51Z" comment="bypassed redirect" />
        </revisions>
      </page>
    </pages>
  </query>
</api>

--------------------------------------------------------------------------------

Above is an example use.

I'd like a different page that lets me display user contribution (individual
revisions) like page history. Basically I want
*the diff link to be clickable
*timestamp displayed in standard dating format (depending on users pref like
how history date & time is displayed)
*possibly adding a text box to history pages so that I can "search" a users
contribution.
**For example if I went to page history of "Main Page" and search for
"MZMcBride" I'd get the above feed
Comment 5 Chad H. 2010-07-12 09:48:28 UTC
*** Bug 24345 has been marked as a duplicate of this bug. ***
Comment 6 MZMcBride 2012-03-03 23:16:49 UTC
So all that's missing for this now is hacking up SpecialContributions.php? Is that right? If so, this can probably be marked "easy", right? Doing so.
Comment 7 MZMcBride 2012-03-03 23:32:17 UTC
(In reply to comment #1)
> I'm committing a schema change for this now.  It will require a new index on
> the revision table, so it probably won't be live for quite some time even once
> the code to do it is written.

This was committed in r25267.
Comment 8 MZMcBride 2012-03-03 23:44:39 UTC
(In reply to comment #3)
> (In reply to comment #1)
>> I'm committing a schema change for this now.  It will require a new index on
>> the revision table, so it probably won't be live for quite some time even once
>> the code to do it is written.
>> 
> 
> Removing schema-change keyword on the assumption that it's already happened:
> the comment is from 2007 and the API manages to do this just fine
> (prop=revisions&rvuser=Catrope&titles=Main_Page).

Re-adding "schema-change" keyword and adding "shell" keyword (someone has to apply the change, after all). Also targeting this for 1.20 and bumping importance a bit, per my reasoning at bug 2667 comment 4.

The assumption here is faulty, I think. Or at least this is easily answerable. I asked in #wikimedia-tech if the index "INDEX page_user_timestamp (rev_page,rev_user,rev_timestamp)" was on the revision table. DaB. said that the Toolserver didn't have this index, so it was likely that the masters don't either. Based on this, I'd like hard evidence to be able to say that a schema change isn't necessary here still.
Comment 9 MZMcBride 2012-03-05 23:53:49 UTC
(In reply to comment #7)
> (In reply to comment #1)
>> I'm committing a schema change for this now.  It will require a new index on
>> the revision table, so it probably won't be live for quite some time even once
>> the code to do it is written.
> 
> This was committed in r25267.

And reverted in r25290, apparently.

I guess I'll remove a few keywords now. From a coding point-of-view, this is (still) a pretty easy bug to resolve, it appears. So we'll keep the "easy" keyword.
Comment 10 Sam Reed (reedy) 2012-03-06 00:12:34 UTC
r113109, r113110, r113111 brings this back into life

It'll get added to WMF later in the 1.20 cycle, maybe before deployment

Are there other
Comment 11 MZMcBride 2012-03-06 02:15:48 UTC
(In reply to comment #10)
> r113109, r113110, r113111 brings this back into life
> 
> It'll get added to WMF later in the 1.20 cycle, maybe before deployment
> 
> Are there other

Comment 0 was about getting Special:Contributions to do this. I don't see any commits to that part of the code. (You just need a page title input on Special:Contributions/user. It should be trivial.) Should this bug be re-opened or should a separate ticket be filed?
Comment 12 Krinkle 2012-03-06 02:27:55 UTC
Reopening per comment 11.

Database index supporting this filter has been added. But interface (HistoryAction, SpecialContributions and possibly some API modules as well) still needs to be done.
Comment 13 Sam Reed (reedy) 2012-03-06 16:23:30 UTC
Indeed, per my IRC comment straight afterwards wondering why I actually closed it, and my cut off question asking what else needs doing
Comment 14 Sam Reed (reedy) 2012-05-08 21:05:31 UTC
TODO:
(In reply to comment #12)
> Reopening per comment 11.
> 
> Database index supporting this filter has been added. But interface
> (HistoryAction, SpecialContributions and possibly some API modules as well)
> still needs to be done.
Comment 15 Asher Feldman 2012-06-20 00:05:41 UTC
I don't think we currently need the new revision index in order to support user edits per page queries.

From testing queries like: 

select * from revision where rev_page = 1952670 and rev_user_text = "HBC AIV helperbot7" order by rev_timestamp desc limit 50;

where that page id = http://en.wikipedia.org/wiki/Wikipedia:Administrator_intervention_against_vandalism with > 800k edits and that user has > 122k edits and it ran in ~30ms on a prod enwiki db.  I got similar times against frequent editors of the India article, and when querying both with non-existent rev_user_text values.
Comment 16 Liangent 2012-11-17 09:56:16 UTC
*** Bug 42217 has been marked as a duplicate of this bug. ***
Comment 17 MZMcBride 2012-11-17 17:19:42 UTC
What's going on with this bug?
Comment 18 MZMcBride 2012-11-17 17:31:05 UTC
Okay, so it looks like tables.sql in master has the relevant index:

CREATE INDEX /*i*/page_user_timestamp ON /*_*/revision (rev_page,rev_user,rev_timestamp);

The question becomes whether this index has been applied to Wikimedia wikis or whether it needs to be (cf. comment 15), then?
Comment 19 Sam Reed (reedy) 2012-11-17 17:39:01 UTC
> The question becomes whether this index has been applied to Wikimedia wikis or
> whether it needs to be (cf. comment 15), then?

mysql:wikiadmin@db63 [enwiki]> show indexes from revision\G
*************************** 1. row ***************************
       Table: revision
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: rev_page
   Collation: A
 Cardinality: 100127178
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 2. row ***************************
       Table: revision
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 2
 Column_name: rev_id
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 3. row ***************************
       Table: revision
  Non_unique: 0
    Key_name: rev_id
Seq_in_index: 1
 Column_name: rev_id
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 4. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: rev_timestamp
Seq_in_index: 1
 Column_name: rev_timestamp
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 5. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: page_timestamp
Seq_in_index: 1
 Column_name: rev_page
   Collation: A
 Cardinality: 62579486
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 6. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: page_timestamp
Seq_in_index: 2
 Column_name: rev_timestamp
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 7. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: user_timestamp
Seq_in_index: 1
 Column_name: rev_user
   Collation: A
 Cardinality: 27813105
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 8. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: user_timestamp
Seq_in_index: 2
 Column_name: rev_timestamp
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 9. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: usertext_timestamp
Seq_in_index: 1
 Column_name: rev_user_text
   Collation: A
 Cardinality: 1053970
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 10. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: usertext_timestamp
Seq_in_index: 2
 Column_name: rev_timestamp
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 11. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: usertext_timestamp
Seq_in_index: 3
 Column_name: rev_user
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 12. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: usertext_timestamp
Seq_in_index: 4
 Column_name: rev_deleted
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 13. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: usertext_timestamp
Seq_in_index: 5
 Column_name: rev_minor_edit
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 14. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: usertext_timestamp
Seq_in_index: 6
 Column_name: rev_text_id
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 15. row ***************************
       Table: revision
  Non_unique: 1
    Key_name: usertext_timestamp
Seq_in_index: 7
 Column_name: rev_comment
   Collation: A
 Cardinality: 500635893
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment:
15 rows in set (0.27 sec)
Comment 20 MZMcBride 2012-11-17 17:50:56 UTC
tl;dr: the page_user_timestamp index is _not_ on the Wikimedia databases.

Asher: are you comfortable with user interface exposure of this functionality without this index?
Comment 21 Andre Klapper 2013-03-20 15:35:13 UTC
(In reply to comment #20)
> tl;dr: the page_user_timestamp index is _not_ on the Wikimedia databases.
> 
> Asher: are you comfortable with user interface exposure of this functionality
> without this index?

Asher: ping?
Comment 22 Asher Feldman 2013-04-18 22:12:53 UTC
Yes, I am so long as the interface doesn't provide pagination based on limit+offset queries. 

mysql:root@db1051 [enwiki]> select count(*) from revision where rev_page = 1952670;
+----------+
| count(*) |
+----------+
|   941113 |
+----------+
1 row in set (0.68 sec)

mysql:root@db1051 [enwiki]> select count(1) as count, rev_user from revision where rev_page = 1952670 group by rev_user order by count desc limit 1;
+--------+----------+
| count  | rev_user |
+--------+----------+
| 139166 |  6327251 |
+--------+----------+

mysql:root@db1051 [enwiki]> explain select * from revision where rev_page = 1952670 and rev_user = 6327251 order by rev_timestamp desc limit 25;
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+
| id   | select_type | table    | type | possible_keys                         | key            | key_len | ref   | rows    | Extra       |
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+
|    1 | SIMPLE      | revision | ref  | PRIMARY,page_timestamp,user_timestamp | page_timestamp | 4       | const | 2707632 | Using where |
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+
1 row in set (0.03 sec)

2707632 looks bad, but:

mysql:root@db1051 [enwiki]> flush status;
mysql:root@db1051 [enwiki]> select * from revision where rev_page = 1952670 and rev_user = 6327251 order by rev_timestamp desc limit 25;
....

mysql:root@db1051 [enwiki]> show status like 'Handler_read%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Handler_read_first       | 0     |
| Handler_read_key         | 1     |
| Handler_read_last        | 0     |
| Handler_read_next        | 0     |
| Handler_read_prev        | 151   |
| Handler_read_rnd         | 0     |
| Handler_read_rnd_deleted | 0     |
| Handler_read_rnd_next    | 0     |
+--------------------------+-------+
8 rows in set (0.02 sec)

That example is for the most edited page and the user with the most edits.  The worst case for the current schema would be a user with only one edit of the most edited page. 

mysql:root@db1051 [enwiki]> explain select * from revision where rev_page = 1952670 and rev_user = 4305640 order by rev_timestamp desc limit 25;
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+
| id   | select_type | table    | type | possible_keys                         | key            | key_len | ref   | rows | Extra       |
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+
|    1 | SIMPLE      | revision | ref  | PRIMARY,page_timestamp,user_timestamp | user_timestamp | 4       | const |  479 | Using where |
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+
1 row in set (0.03 sec)

mysql:root@db1051 [enwiki]> flush status;
Query OK, 0 rows affected (0.03 sec)

mysql:root@db1051 [enwiki]> select * from revision where rev_page = 1952670 and rev_user = 4305640 order by rev_timestamp desc limit 25;
+-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+
| rev_id    | rev_page | rev_text_id | rev_comment         | rev_user | rev_user_text | rev_timestamp  | rev_minor_edit | rev_deleted | rev_len | rev_parent_id | rev_sha1                        |
+-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+
| 134864881 |  1952670 |   134137447 | /* User-reported */ |  4305640 | AM01NU06      | 20070531180220 |              0 |           0 |    1905 |     134864494 | l2ac4niowr2u88gl84ii3vz9cbrkc6t |
+-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+
1 row in set (0.13 sec)

mysql:root@db1051 [enwiki]> show status like 'Handler_read%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Handler_read_first       | 0     |
| Handler_read_key         | 1     |
| Handler_read_last        | 0     |
| Handler_read_next        | 0     |
| Handler_read_prev        | 480   |
| Handler_read_rnd         | 0     |
| Handler_read_rnd_deleted | 0     |
| Handler_read_rnd_next    | 0     |
+--------------------------+-------+
8 rows in set (0.03 sec)
 
130ms is a little sucky but for a worst case, this should still be ok.

The nice thing is that hopefully we'll be able to use extended_keys in MariaDB 5.5.31.  Everything is currently migrating to 5.5.30 which has an optimizer bug that I reported regarding extended_keys so I have it disabled in production for now.  It allows the primary key to be fully utilized as the right side member of every secondary key.  The revision primary key is:

  PRIMARY KEY (`rev_page`,`rev_id`),

and we have in production:

  KEY `user_timestamp` (`rev_user`,`rev_timestamp`),

So with extended_keys, rev_page will be useable without having to increase the index side by duplicating as with the proposed page_user_timestamp.
Comment 23 Aaron Schulz 2013-04-18 22:16:23 UTC
Note that the revision PRIMARY KEY is just rev_id for new tables for MediaWiki. No one ever got around to changing existing ones already.
Comment 24 Rob Lanphier 2013-04-23 20:54:58 UTC
Unassigning from Reedy since he's not working on it.
Comment 25 Mark A. Hershberger 2014-06-21 19:58:13 UTC
Removing target milestone that was in the past.

If you want this in a specific release, have a good reason AND you are willing to find resources to fix this bug, feel free to change it to something appropriate.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links