Last modified: 2011-05-22 20:24:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T26479, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 24479 - CodeReview has many inefficient queries
CodeReview has many inefficient queries
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CodeReview (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Roan Kattouw
:
Depends on: 23720 25437
Blocks: code_quality 26394
  Show dependency treegraph
 
Reported: 2010-07-21 20:00 UTC by Sam Reed (reedy)
Modified: 2011-05-22 20:24 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sam Reed (reedy) 2010-07-21 20:00:27 UTC
Unindexed queries

Other stuff such as

// This way of getting GET parameters is horrible, but effective.
$fields = array_merge( $_GET, $_POST );
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-21 20:02:59 UTC
We have a component for CodeReview already, so we don't need a tracker bug for everything wrong with it . . .
Comment 2 Sam Reed (reedy) 2010-07-21 20:08:39 UTC
Retitling bug.

Roan addressed second issue in CR r47395#c7849
Comment 3 Sam Reed (reedy) 2010-10-23 13:47:33 UTC
I've got the mysql high performance book if anyone wants it for that bug
Comment 4 Roan Kattouw 2010-11-29 22:05:56 UTC
The query for listing revisions by path does something like WHERE cp_repo_id=1 AND cp_path LIKE '/trunk%' ORDER BY cp_rev_id DESC . If my understanding about how the code_paths table works is correct (will confirm in the morning), we should be able to convert that to AND cp_path='/trunk' , introduce an index on (cp_repo_id, cp_path, cp_rev_id) and that should make this query happy.
Comment 5 Sam Reed (reedy) 2010-12-04 19:27:03 UTC
Special:Code/MediaWiki/tag/api

ie

SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `code_tags`,`code_rev` LEFT JOIN `code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '3' AND (cr_repo_id=ct_repo_id) AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY cr_id ORDER BY cr_id DESC LIMIT 51;

Filesorts according to VVV
Comment 6 Sam Reed (reedy) 2010-12-04 22:05:00 UTC
<vvv> Reedy: the generic problem is: there is code_something with cs_repo_id, cs_rev_id and cs_value; the only index is primary key on all three columns in the order listed. There is code_rev which is joined with code_something by cr_repo_id and cr_id.
<vvv> The query on code_something filesorts
Comment 7 Sam Reed (reedy) 2010-12-31 05:53:59 UTC
It's the order by/group by not being on the first table...


mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags`,`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '3' AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
| id | select_type | table           | type   | possible_keys                     | key        | key_len | ref                                                    | rows | Extra                    |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id                | PRIMARY    | 4       | const                                                  |    1 | Using where; Using index |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 |                          |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time           | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index              |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
3 rows in set (0.00 sec)

mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags`,`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '3' AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY cr_id ORDER BY cr_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table           | type   | possible_keys                     | key        | key_len | ref                                                    | rows | Extra                                                     |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id                | PRIMARY    | 4       | const                                                  |    1 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 |                                                           |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time           | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index                                               |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
3 rows in set (0.00 sec)
Comment 8 Sam Reed (reedy) 2010-12-31 06:00:11 UTC
mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags`,`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '3' AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
| id | select_type | table           | type   | possible_keys                     | key        | key_len | ref                                                    | rows | Extra                    |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id                | PRIMARY    | 4       | const                                                  |    1 | Using where; Using index |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 |                          |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time           | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index              |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
3 rows in set (0.00 sec)

mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags`,`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '3' AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY cr_id ORDER BY cr_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table           | type   | possible_keys                     | key        | key_len | ref                                                    | rows | Extra                                                     |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id                | PRIMARY    | 4       | const                                                  |    1 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 |                                                           |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time           | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index                                               |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
3 rows in set (0.00 sec)



mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags`,`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '3' AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' ORDER BY cr_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-------------------------------------------+
| id | select_type | table           | type   | possible_keys                     | key        | key_len | ref                                                    | rows | Extra                                     |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-------------------------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id                | PRIMARY    | 4       | const                                                  |    1 | Using where; Using index; Using temporary |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 |                                           |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time           | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index                               |
+----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-------------------------------------------+

Order BY only uses temporary
Group By (With/without order by) uses temporary and file sorts
Comment 9 Sam Reed (reedy) 2010-12-31 06:44:21 UTC
I suppose, also, sorting/group by cr_id isn't much good when it's unindexed directly/as the first column

PK is cr_repo_id, cr_id....

Though, adding the index doesn't help.



The discussion that Roan and I had at my house, seemed to suggest that it might be a case of there's nothing we can do about some of these queries.

And as such, the hard coded limits in place, might just be removed ("fixing" the other bugs), and just have to deal with things maybe being a bit slow.

If this then annoys domas, maybe he can help improve the queries? ;)
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-12-31 18:21:21 UTC
(In reply to comment #7)
> mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS
> comments,cr_path,cr_message,cr_author,cr_timestamp FROM
> `mw_code_tags`,`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id =
> cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '3' AND (ct_repo_id =
> '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY cr_id ORDER BY cr_id
> DESC LIMIT 51;
> +----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
> | id | select_type | table           | type   | possible_keys                  
>   | key        | key_len | ref                                                 
>   | rows | Extra                                                     |
> +----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
> |  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id             
>   | PRIMARY    | 4       | const                                               
>   |    1 | Using where; Using index; Using temporary; Using filesort |
> |  1 | SIMPLE      | mw_code_rev     | eq_ref |
> PRIMARY,cr_repo_id,cr_repo_author | PRIMARY    | 8       |
> const,wikidb.mw_code_tags.ct_rev_id                    |    1 |                
>                                           |
> |  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time        
>   | cc_repo_id | 8       |
> wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index    
>                                           |
> +----+-------------+-----------------+--------+-----------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
> 3 rows in set (0.00 sec)

This is a classic pitfall for the MySQL optimizer.  What version are you using?  On the toolserver, 5.1.53, it avoids the filesort.

The problem is that you have a few separate conditions that you need indexes for, aside for join conditions and assuming MySQL is smart enough to do equality propagation (not always a good bet): ct_tag = 'api', ct_repo_id = '3', and the order by/group by.  No one index will work for all of them, so MySQL has to choose which index it wants to use.  Since there's a LIMIT, the filesort option is probably the worst, but it chooses that option anyway.  Often in this case a FORCE INDEX will fix the problem.  On the toolserver, it's smart enough to select from code_rev first and not do a filesort, and then do extra scanning on code_tags.

In this case, your best bet is probably to add an index on (ct_repo_id, ct_tag, ct_rev_id), if this query is a problem in practice after you stop it from filesorting.  If you're not ever ordering or grouping by ct_tag, you can just change the primary key (but I'd guess you sometimes do group by it).

I'd also clean up the query a bit to avoid relying on equality propagation, changing "cr_repo_id = '3'" to "cr_repo_id = ct_repo_id" and changing "GROUP BY cr_id ORDER BY cr_id DESC" to "GROUP BY ct_rev_id ORDER BY ct_rev_id DESC".  But the latter part is probably not necessary, just makes it clearer how you expect the query to execute.
Comment 11 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-12-31 18:22:29 UTC
(Of course, the query isn't scalable anyway due to COUNT().  It will have to scan arbitrarily many rows.  The only way to fully fix it would be to use a summary table of some kind, but those are a pain to maintain, so better not to do that unless you can't fix it some other way.  It's not like we're going to have 100 million code reviews anytime soon.)
Comment 12 Sam Reed (reedy) 2010-12-31 19:21:15 UTC
(In reply to comment #10)
> This is a classic pitfall for the MySQL optimizer.  What version are you using?
>  On the toolserver, 5.1.53, it avoids the filesort.

reedy@ubuntu64-esxi:~$ mysql --version
mysql  Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (x86_64) using readline 6.1


What comes with Ubuntu 10.10 x64 Server
Comment 13 Sam Reed (reedy) 2010-12-31 19:45:12 UTC
-- Freetext tagging for revisions
CREATE TABLE /*_*/code_tags (
  ct_repo_id int not null,
  ct_rev_id int not null,
  ct_tag varbinary(255) not null,

  primary key (ct_repo_id,ct_rev_id,ct_tag)
) /*$wgDBTableOptions*/;

CREATE INDEX /*i*/ct_repo_id ON /*_*/code_tags (ct_repo_id,ct_tag,ct_rev_id);

That suggested index is seemingly already there...


It seems like on general you're agreeing with the just leave them be... Versions that improve/fix the optimiser will improve queries, and like you said, we're not going to have millions upon millions of revisions...

For the equality propagation, the originals had foo = bar AND bar = 3 (or something similar), i was just poking around and changed for explicit values.

Seemingly, the best thing is maybe getting it to do FORCE INDEX.. But it isn't garunteed to help

Maybe just remove the technical restrictions people have put in, which are causing some annoyances, and close the bugs...?
Comment 14 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-12-31 19:49:18 UTC
Oops, yeah, I missed the index.  Try adding FORCE INDEX (ct_repo_id) and see what that gives you.
Comment 15 Sam Reed (reedy) 2010-12-31 20:01:59 UTC
mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags`,`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = ct_repo_id AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY cr_id ORDER BY cr_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table           | type   | possible_keys                           | key        | key_len | ref                                                    | rows | Extra                                                     |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id                      | PRIMARY    | 4       | const                                                  |    1 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 | Using where                                               |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index                                               |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags` FORCE INDEX (ct_repo_id),`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = ct_repo_id AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY cr_id ORDER BY cr_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table           | type   | possible_keys                           | key        | key_len | ref                                                    | rows | Extra                                                     |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | ct_repo_id                              | ct_repo_id | 261     | const,const                                            |    1 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 | Using where                                               |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index                                               |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+-----------------------------------------------------------+
3 rows in set (0.00 sec)


No improvement unfortunately :(
Comment 16 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-12-31 20:04:49 UTC
Then it's probably lack of equality propagation.  Change cr_id to ct_rev_id in the ORDER BY and GROUP BY clauses.
Comment 17 Sam Reed (reedy) 2010-12-31 20:07:08 UTC
mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags` FORCE INDEX (ct_repo_id),`mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = ct_repo_id AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY ct_rev_id ORDER BY ct_rev_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
| id | select_type | table           | type   | possible_keys                           | key        | key_len | ref                                                    | rows | Extra                    |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | ct_repo_id                              | ct_repo_id | 261     | const,const                                            |    1 | Using where; Using index |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 | Using where              |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index              |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
3 rows in set (0.00 sec)

mysql> describe SELECT cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_tags`, `mw_code_rev` LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = ct_repo_id AND (ct_repo_id = '3') AND (cr_id=ct_rev_id) AND ct_tag = 'api' GROUP BY ct_rev_id ORDER BY ct_rev_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
| id | select_type | table           | type   | possible_keys                           | key        | key_len | ref                                                    | rows | Extra                    |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
|  1 | SIMPLE      | mw_code_tags    | ref    | PRIMARY,ct_repo_id                      | PRIMARY    | 4       | const                                                  |    1 | Using where; Using index |
|  1 | SIMPLE      | mw_code_rev     | eq_ref | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 8       | const,wikidb.mw_code_tags.ct_rev_id                    |    1 | Using where              |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_rev.cr_repo_id,wikidb.mw_code_rev.cr_id |    1 | Using index              |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+--------------------------------------------------------+------+--------------------------+
3 rows in set (0.00 sec)


Removing the force index, and just changing the order by/group by improves it...

Coool
Comment 18 Roan Kattouw 2010-12-31 20:12:02 UTC
(In reply to comment #11)
> (Of course, the query isn't scalable anyway due to COUNT().  It will have to
> scan arbitrarily many rows.  The only way to fully fix it would be to use a
> summary table of some kind, but those are a pain to maintain, so better not to
> do that unless you can't fix it some other way.  It's not like we're going to
> have 100 million code reviews anytime soon.)

Yes, the COUNT() should be killed. Its results aren't displayed in trunk anymore anyway. The point about the code_rev table being smallish is also true, that's why these queries aren't much of a problem in practice.
Comment 19 Roan Kattouw 2010-12-31 20:17:29 UTC
(In reply to comment #18)
> Yes, the COUNT() should be killed. Its results aren't displayed in trunk
> anymore anyway.
Ignore me, this is a different COUNT
Comment 20 Sam Reed (reedy) 2011-01-01 01:22:55 UTC
So, 2 explicitally reported bugs now fixed.

Improved query for Tags in r79365 based on the above help...

Wonder what else needs fixing up...
Comment 21 Sam Reed (reedy) 2011-01-01 02:20:54 UTC
Filtered by path and status
mysql> describe SELECT /* IndexPager::reallyDoQuery (SvnRevStatusTablePager) Reedy */  cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp  FROM `mw_code_paths` INNER JOIN `mw_code_rev` ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id))  WHERE cp_repo_id = '3' AND cp_path = '/trunk' AND cr_status = 'new'  GROUP BY cp_rev_id ORDER BY cp_rev_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys                           | key        | key_len | ref                                                      | rows  | Extra                                        |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | mw_code_rev     | ref    | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 4       | const                                                    | 45393 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_paths   | eq_ref | PRIMARY                                 | PRIMARY    | 265     | const,wikidb.mw_code_rev.cr_id,const                     |     1 | Using where; Using index                     |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id |     1 | Using index                                  |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+
3 rows in set (0.00 sec)

Filtered by path

mysql> describe SELECT /* IndexPager::reallyDoQuery (SvnRevTablePager) Reedy */  cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp  FROM `mw_code_paths` INNER JOIN `mw_code_rev` ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id))  WHERE cp_repo_id = '3' AND cp_path = '/trunk'  GROUP BY cp_rev_id ORDER BY cp_rev_id DESC LIMIT 51;
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+
| id | select_type | table           | type   | possible_keys                           | key        | key_len | ref                                                      | rows  | Extra                           |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+
|  1 | SIMPLE      | mw_code_rev     | ref    | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY    | 4       | const                                                    | 45393 | Using temporary; Using filesort |
|  1 | SIMPLE      | mw_code_paths   | eq_ref | PRIMARY                                 | PRIMARY    | 265     | const,wikidb.mw_code_rev.cr_id,const                     |     1 | Using where; Using index        |
|  1 | SIMPLE      | mw_code_comment | ref    | cc_repo_id,cc_repo_time                 | cc_repo_id | 8       | wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id |     1 | Using index                     |
+----+-------------+-----------------+--------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+
3 rows in set (0.00 sec)
Comment 22 Sam Reed (reedy) 2011-01-01 02:23:12 UTC
Both of the above can be made "nice" by sorting and grouping by cr_id rather than cp_rev_id
Comment 23 Sam Reed (reedy) 2011-01-01 02:30:46 UTC
Above ones fixed in r79409

Filesort:


mysql> describe SELECT /* IndexPager::reallyDoQuery (CodeStatusChangeTablePager) Reedy */  cpc_timestamp,cpc_user_text,cpc_rev_id,cr_author,cr_message,cpc_removed,cpc_added,cr_status  FROM `mw_code_prop_changes` LEFT JOIN `mw_code_rev` ON ((cpc_repo_id = cr_repo_id AND cpc_rev_id = cr_id))  WHERE cpc_repo_id = '3' AND cpc_attrib = 'status'  ORDER BY cpc_timestamp DESC LIMIT 51;
+----+-------------+----------------------+--------+-----------------------------------------+-------------------+---------+--------------------------------------------------------------------------------+------+-----------------------------+
| id | select_type | table                | type   | possible_keys                           | key               | key_len | ref                                                                            | rows | Extra                       |
+----+-------------+----------------------+--------+-----------------------------------------+-------------------+---------+--------------------------------------------------------------------------------+------+-----------------------------+
|  1 | SIMPLE      | mw_code_prop_changes | ref    | cpc_repo_rev_time,cpc_repo_time         | cpc_repo_rev_time | 4       | const                                                                          |   10 | Using where; Using filesort |
|  1 | SIMPLE      | mw_code_rev          | eq_ref | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY           | 8       | wikidb.mw_code_prop_changes.cpc_repo_id,wikidb.mw_code_prop_changes.cpc_rev_id |    1 |                             |
+----+-------------+----------------------+--------+-----------------------------------------+-------------------+---------+--------------------------------------------------------------------------------+------+-----------------------------+
2 rows in set (0.00 sec)
Comment 24 Sam Reed (reedy) 2011-01-01 02:38:52 UTC
I'm guessing this one might need another index...

It's doing

cpc_repo_id, cpc_rev_id in the LEFT JOIN

And then

cpc_repo_id cpc_attrib cpc_timestamp



Also, this table doesn't have a Primary Key..

--
-- Changes to review metadata for a single code revision.
--
CREATE TABLE /*_*/code_prop_changes (
  -- Repository ID
  cpc_repo_id int not null,
  -- Native ID number of this revision within the repository.
  cpc_rev_id int not null,

  -- The item that was changed
  cpc_attrib enum('status','tags') not null,
  -- How it was changed
  cpc_removed blob,
  cpc_added blob,

  -- Timestamp of the change, in MediaWiki format.
  cpc_timestamp binary(14) not null default '',

  -- User id/name of the commenter
  cpc_user int not null,
  cpc_user_text varchar(255) not null
) /*$wgDBTableOptions*/;

CREATE INDEX /*i*/cpc_repo_rev_time ON /*_*/code_prop_changes (cpc_repo_id, cpc_rev_id, cpc_timestamp);
CREATE INDEX /*i*/cpc_repo_time ON /*_*/code_prop_changes (cpc_repo_id, cpc_timestamp);


code_signoffs also doesn't have a PK... But does have a unique index...
Comment 25 Roan Kattouw 2011-01-01 23:36:12 UTC
(In reply to comment #24)
> Also, this table doesn't have a Primary Key..
> 
I don't see a unique index on this table that makes sense, TBH. The blob fields can't be indexed properly (you can only index the first N chars using cpc_added(123)) and the same change can occur multiple times.

> code_signoffs also doesn't have a PK... But does have a unique index...
Good enough, AFAIK.
Comment 26 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-01-02 00:27:30 UTC
InnoDB tables should always have an explicit primary key.  Otherwise (if there are no unique indexes) it will make a secret invisible primary key anyway, which you can't observe or control.  Best to add an autoincrement primary key if there are no possible unique keys as-is.
Comment 27 Sam Reed (reedy) 2011-01-23 23:32:23 UTC
Need to poke at this and see the state of it...

I think this is probably somewhere near done...
Comment 28 Sam Reed (reedy) 2011-05-22 20:24:06 UTC
Marking as a fixed.. I think they're cleaned up now..

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links