Last modified: 2011-03-05 15:33:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 23720 - CodeReview doesn't show ancient revisions for a path
CodeReview doesn't show ancient revisions for a path
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CodeReview (Other open bugs)
unspecified
All All
: Normal major (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.mediawiki.org/w/index.php?...
:
Depends on:
Blocks: 19007 24479 25437 26529
  Show dependency treegraph
 
Reported: 2010-05-30 20:33 UTC by Max Semenik
Modified: 2011-03-05 15:33 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Max Semenik 2010-05-30 20:33:14 UTC
See example link, there were more commits under that path. Reproduceable on other paths, too.
Comment 1 Max Semenik 2010-07-30 21:11:57 UTC
As it turned out, it's on purpose, see CodeRevision::getPathConds():

			// performance
			'cp_rev_id > ' . ( $this->mRepo->getLastStoredRev() - 20000 ),

Marking this as dependent on bug 24479 which should address DB performance.
Comment 2 Sam Reed (reedy) 2010-12-31 20:33:36 UTC
mysql> describe SELECT cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp  FROM `mw_code_paths` INNER JOIN `mw_code_rev` ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id))  WHERE cp_repo_id = '3' AND (cp_path  LIKE '/trunk/extensions/Cite/cite\_text%' ) AND (cp_rev_id > 58352)  GROUP BY cp_rev_id ORDER BY cp_rev_id;
+----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+
| id | select_type | table           | type  | possible_keys                           | key        | key_len | ref                                                      | rows  | Extra                                        |
+----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | mw_code_rev     | range | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 8       | NULL                                                     | 21804 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_paths   | ref   | PRIMARY                                 | PRIMARY    | 8       | const,wikidb.mw_code_rev.cr_id                           |     2 | Using where; Using index                     |
|  1 | SIMPLE      | mw_code_comment | ref   | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id |     1 | Using index                                  |
+----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+
3 rows in set (0.00 sec)


mysql> describe SELECT cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp  FROM `mw_code_paths` INNER JOIN `mw_code_rev` ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id))  WHERE cp_repo_id = '3' AND (cp_path  LIKE '/trunk/extensions/Cite/cite\_text%' ) GROUP BY cp_rev_id ORDER BY cp_rev_id;
+----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+
| id | select_type | table           | type | possible_keys                           | key        | key_len | ref                                                      | rows  | Extra                           |
+----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+
|  1 | SIMPLE      | mw_code_rev     | ref  | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 4       | const                                                    | 49838 | Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_paths   | ref  | PRIMARY                                 | PRIMARY    | 8       | const,wikidb.mw_code_rev.cr_id                           |     2 | Using where; Using index        |
|  1 | SIMPLE      | mw_code_comment | ref  | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id |     1 | Using index                     |
+----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+
3 rows in set (0.00 sec)
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-12-31 21:03:07 UTC
As Roan suggests, this should be fixed by storing paths more redundantly, so the LIKE can be made into an equality check.  Otherwise there's no way to get it to play nice with the grouping/ordering.
Comment 4 Roan Kattouw 2010-12-31 21:07:31 UTC
(In reply to comment #3)
> As Roan suggests, this should be fixed by storing paths more redundantly, so
> the LIKE can be made into an equality check.  Otherwise there's no way to get
> it to play nice with the grouping/ordering.

Elaborating on my suggestion on IRC:

Currently we only store the path of each changed file in the paths table. If instead we also stored the paths of all of their ancestors, we could simply do WHERE cp_path='/trunk/phase3' instead of WHERE cp_path LIKE '/trunk/phase3%' , which will search directories recursively.

For example, for a commit that touches ApiQueryBase.php and MessagesEn.php, we'd store the following paths:

/trunk/phase3/includes/api/ApiQueryBase.php
/trunk/phase3/includes/api
/trunk/phase3/includes
/trunk/phase3/languages/messages/MessagesEn.php
/trunk/phase3/languages/messages
/trunk/phase3/languages
/trunk/phase3
/trunk

This would improve query performance at the expense of table size.
Comment 5 Sam Reed (reedy) 2010-12-31 21:55:03 UTC
Interesting. We'd also need to presumably back populate this for the old revs...

No database changes, just some more coding changes to split stuff down...
Comment 6 Roan Kattouw 2010-12-31 22:19:51 UTC
(In reply to comment #5)
> Interesting. We'd also need to presumably back populate this for the old
> revs...
> 
Yes. I could do that locally and measure the size increase that way.

> No database changes, just some more coding changes to split stuff down...
Exactly.
Comment 7 Sam Reed (reedy) 2011-01-01 00:49:41 UTC
r79379, r79381, r79382, r79384...

Before:
mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3 AND cp_rev_id > 1 AND cp_rev_id < 1000;
+----------+
| count(*) |
+----------+
|     2580 |
+----------+
1 row in set (0.02 sec)


After:

mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3 AND cp_rev_id > 1 AND cp_rev_id < 1000;
+----------+
| count(*) |
+----------+
|     6600 |
+----------+
1 row in set (0.00 sec)

After the first 1000 done

mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3;
+----------+
| count(*) |
+----------+
|   516313 |
+----------+
1 row in set (0.26 sec)


After the whole repo (78352 locally)

mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3;
+----------+
| count(*) |
+----------+
|   988825 |
+----------+
1 row in set (1.93 sec)


Not really fast, but hardly slow either. Took a few minutes to execute

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links