Last modified: 2014-03-30 04:17:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T55577, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 53577 - IndexPager::buildQueryInfo (LogPager) query needs tuning
IndexPager::buildQueryInfo (LogPager) query needs tuning
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.23.0
All All
: Normal normal (vote)
: ---
Assigned To: Sean Pringle
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-30 09:57 UTC by Sean Pringle
Modified: 2014-03-30 04:17 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sean Pringle 2013-08-30 09:57:47 UTC
Have seen multiple instances of this one backing up on commonswiki slaves recently:

SELECT /* IndexPager::buildQueryInfo (LogPager) 148.160.132.46 */  log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_comment,log_params,log_deleted,user_id,user_name,user_editcount,ts_tags  FROM `logging` FORCE INDEX (times) LEFT JOIN `user` ON ((log_user=user_id)) LEFT JOIN `tag_summary` ON ((ts_log_id=log_id))  WHERE (log_action != 'revision') AND (log_type != 'suppress') AND log_type IN ('delete','move')  AND log_namespace = '0' AND log_title = '0' AND ((log_deleted & 1) = 0)  ORDER BY log_timestamp DESC LIMIT 11

It runs in excess of 10 minutes.

EXPLAIN says the FORCE INDEX sends MySQL onto a very slow index scan over `times` (log_timestamp) which is hardly better than a table scan in this case.

Removing the FORCE allows `page_time` index be used which reduces execution time to a few seconds.

Suggest either filtering by a range on log_timestamp or removing the FORCE.
Comment 1 Sean Pringle 2013-09-19 19:40:43 UTC
Similar:

SELECT /* IndexPager::buildQueryInfo (contributions page unfiltered) AstroChemist */  rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,user_name,page_namespace,page_title,page_is_new,page_latest,page_is_redirect,page_len,ts_tags  FROM `revision` INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `user_groups` ON ((ug_user = rev_user) AND ug_group = 'bot') LEFT JOIN `tag_summary` ON ((ts_rev_id=rev_id)) INNER JOIN `change_tag` FORCE INDEX (change_tag_tag_id) ON ((ct_rev_id=rev_id))  WHERE (rev_user >19555425) AND (ug_group IS NULL) AND (page_namespace != '0') AND ((rev_deleted & 4) = 0) AND ct_tag = 'visualeditor' AND (rev_timestamp<'20140101000000')  ORDER BY rev_timestamp DESC LIMIT 601;

With force: 10+ mins

Without force: 8s
Comment 2 Nemo 2013-10-11 19:36:58 UTC
Was this removed together with other FORCE INDEX recently?
Comment 3 Sean Pringle 2013-10-17 01:08:37 UTC
Comment 2 correct.
Comment 4 Sean Pringle 2013-11-14 11:21:39 UTC
Larger WMF wikis are seeing more spikes of LogPager queries of which an example follows. These have had some recent tuning by initially removing forced indexes that no longer suited data set sizes, which has helped (10min down to ~100s), but we need to do more.

mysql wmf db73 dewiki> explain SELECT /* IndexPager::buildQueryInfo (LogPager) xxx.xxx.xxx.xxx */ log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_comment,log_params,log_deleted,user_id,user_name,user_editcount,ts_tags FROM `logging` LEFT JOIN `user` ON ((log_user=user_id)) LEFT JOIN `tag_summary` ON ((ts_log_id=log_id)) WHERE (log_type NOT IN ('suppress','spamblacklist')) AND log_user = ? AND ((log_deleted & 4) = 0) AND (log_type != 'review') ORDER BY log_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: logging
         type: index
possible_keys: type_time,user_time,log_user_type_time
          key: times
      key_len: 16
          ref: NULL
         rows: 6705
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: user
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: dewiki.logging.log_user
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: tag_summary
         type: ref
possible_keys: ts_log_id,tag_summary_log_id
          key: ts_log_id
      key_len: 5
          ref: dewiki.logging.log_id
         rows: 20172                   <-- often much larger
        Extra: Using where
3 rows in set (0.29 sec)

The remaining problems are:

- The queries are fast enough until one hits cold data leading a slave to bottleneck on disk. The buffer pool churn has a flow-on effect on other normally fast queries which become false positives in the slow log. Especially problematic when a crawler like MSNBOT triggers a spike of these.

- The tag_summary table often has a relatively low cardinality for ts_log_id (many nulls, few ids).

- Sometimes a slave logs multiple identical copies of a query (always a form filtered by log_user) at intervals of several seconds. Presumably someone refreshing a page that is too slow to respond.

Possibilities:

- Pull out only log_id,log_user first to reduce the impact of the filesort and grab user and tag data in follow-up batch query. If the query plan sitches to using Index Condition Pushdown ("Using index condition" in explain), cold data is less of an issue as fewer table rows are hit and fewer pages read.

- Limit the size of the dataset to be considered. By timestamp perhaps, last X months maybe. For heavier users only?

- Improve the method of retrieving ts_tags using something other than tag_summary. Or remove it entirely.

- Isolate these queries to specific slaves (like watchlist).
Comment 5 Sean Pringle 2013-11-15 00:08:53 UTC
Possibilities after talking with Aaron on IRC, assuming group_concat can be sorted in MW code for pg...

Using change_tag instead of tag_summary avoids the poor cardinality of ts_log_id and allows better query plans on `logging` indexes:

v1

enwiki> explain SELECT log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_comment,log_params,log_deleted,user_id,user_name,user_editcount,group_concat(distinct ct_tag separator ' ') as ts_tags FROM `logging`  LEFT JOIN `user` ON ((log_user=user_id)) LEFT JOIN change_tag ON (ct_log_id=log_id) WHERE (log_type NOT IN ('suppress','spamblacklist','review')) AND log_user = ? AND ((log_deleted & 4) = 0) GROUP BY log_id ORDER BY log_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: logging
         type: index
possible_keys: type_time,user_time,log_user_type_time
          key: times
      key_len: 16
          ref: NULL
         rows: 2583
        Extra: Using where; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: user
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: change_tag
         type: ref
possible_keys: ct_log_id,change_tag_log_tag
          key: ct_log_id
      key_len: 5
          ref: enwiki.logging.log_id
         rows: 35
        Extra: Using where; Using index
3 rows in set (0.31 sec)

pt-visual-explain

JOIN
+- Filter with WHERE
|  +- Index lookup
|     key            change_tag->ct_log_id
|     possible_keys  ct_log_id,change_tag_log_tag
|     key_len        5
|     ref            enwiki.logging.log_id
|     rows           35
+- JOIN
   +- Bookmark lookup
   |  +- Table
   |  |  table          user
   |  |  possible_keys  PRIMARY
   |  +- Constant index lookup
   |     key            user->PRIMARY
   |     possible_keys  PRIMARY
   |     key_len        4
   |     ref            const
   |     rows           1
   +- Table scan
      +- TEMPORARY
         table          temporary(logging)
         +- Filter with WHERE
            +- Bookmark lookup
               +- Table
               |  table          logging
               |  possible_keys  type_time,user_time,log_user_type_time
               +- Index scan
                  key            logging->times
                  possible_keys  type_time,user_time,log_user_type_time
                  key_len        16
                  rows           2583

v2

enwiki> explain SELECT log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_comment,log_params,log_deleted,user_id,user_name,user_editcount,(select group_concat(distinct ct_tag separator ' ') from change_tag where ct_log_id = log_id) as ts_tags FROM `logging`  LEFT JOIN `user` ON ((log_user=user_id)) WHERE (log_type NOT IN ('suppress','spamblacklist','review')) AND log_user = ? AND ((log_deleted & 4) = 0) ORDER BY log_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: logging
         type: range
possible_keys: type_time,user_time,log_user_type_time
          key: log_user_type_time
      key_len: 38
          ref: NULL
         rows: 18675
        Extra: Using index condition; Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: user
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: change_tag
         type: ref
possible_keys: ct_log_id,change_tag_log_tag
          key: ct_log_id
      key_len: 5
          ref: enwiki.logging.log_id
         rows: 35
        Extra: Using where; Using index
3 rows in set (0.32 sec)

pt-visual-explain

DEPENDENT SUBQUERY
+- Filter with WHERE
|  +- Index lookup
|     key            change_tag->ct_log_id
|     possible_keys  ct_log_id,change_tag_log_tag
|     key_len        5
|     ref            enwiki.logging.log_id
|     rows           35
+- JOIN
   +- Bookmark lookup
   |  +- Table
   |  |  table          user
   |  |  possible_keys  PRIMARY
   |  +- Constant index lookup
   |     key            user->PRIMARY
   |     possible_keys  PRIMARY
   |     key_len        4
   |     ref            const
   |     rows           1
   +- Filesort
      +- Filter with WHERE
         +- Index range scan
            key            logging->log_user_type_time
            possible_keys  type_time,user_time,log_user_type_time
            key_len        38
            rows           18675


v2 appears slightly better.
Comment 6 Gerrit Notification Bot 2013-11-15 07:26:09 UTC
Change 95584 had a related patch set uploaded by Aaron Schulz:
Changed use of tag_summary to use change_tag with GROUP_CONCAT()

https://gerrit.wikimedia.org/r/95584
Comment 7 Aaron Schulz 2013-11-16 22:21:31 UTC
Though actually doesn't change_tag also have similar poor cardinality problems for the log_id index?
Comment 8 Sean Pringle 2013-11-17 06:35:34 UTC
True, ct_log_id cardinality still isn't brilliant, but it's better than ts_log_id.  The index is one that fluctuates cardinality estimates quite a bit between slaves, requiring periodic ANALYZE.  Eg, today the enwiki slaves manage to report ct_log_id cardinality values between 21 and 290251 :-| 

Also, ct_log_id index covers (ct_log_id,ct_tag) which is useful.

I've been trying different values for innodb_stats_sample_pages (=16 and =32) on several slaves to see if cardinality for a number of flakky indexes can be stabilized without the need for regular analysis (because of http://bugs.mysql.com/bug.php?id=33278). 

After our chat on IRC on Friday I applied the change_tag/subquery version to other examples of slow LogPager logged on various wikis. It is generally better overall but can still be susceptible to hitting too much cold data.

I'd still really like to see consideration on:

- Making log_user=N clause enforced. The examples of the query that don't filter by user id hit the most rows.

- If log_user=N isn't to be used then limit exposure to the last N months of data or something or change the way it's viewed entirely.

- Consideration of your suggestion to break up tag_summary into three tables, which seems like a good plan given we already denormalize for tag_summary.
Comment 9 Gerrit Notification Bot 2013-11-17 06:51:01 UTC
Change 95584 merged by Springle:
Changed use of tag_summary to use change_tag with GROUP_CONCAT()

https://gerrit.wikimedia.org/r/95584
Comment 10 Aaron Schulz 2013-11-17 07:26:27 UTC
Which slow queries are *not* filtering on user ID? I thought that is where the slow queries still mostly happen.
Comment 11 Sean Pringle 2013-11-17 07:47:33 UTC
You're right, most of the slow queries we've seen in spikes do filter by log_user. But there are others that don't that also run slow.

Ishmael, S3, db1003:

/* IndexPager::buildQueryInfo */ select log_id, log_type, log_action, log_timestamp, log_user, log_user_text, log_namespace, log_title, log_comment, log_params, log_deleted, user_id, user_name, user_editcount, ts_tags from `logging` left join `user` on ((log_user=user_id)) left join `tag_summary` on ((ts_log_id=log_id)) where (log_type not in(?+)) and (log_type != ?) order by log_timestamp desc limit ?

I went to S3 because that's where I saw the query form previously (zhwiktionary at the time). A quick check on S2 slaves doesn't show the same query running slow there, though there are other versions:

* IndexPager::buildQueryInfo */ select log_id, log_type, log_action, log_timestamp, log_user, log_user_text, log_namespace, log_title, log_comment, log_params, log_deleted, user_id, user_name, user_editcount, ts_tags from `logging` left join `user` on ((log_user=user_id)) left join `tag_summary` on ((ts_log_id=log_id)) where (log_type != ?) and log_type = ? and log_namespace = ? and log_title = ? and ((log_deleted & ?) != ?) order by log_timestamp desc limit ?

It's possible we'll trawl some of these up with the existing patch. Just something to keep in mind.
Comment 12 Andre Klapper 2014-01-09 11:55:32 UTC
Resetting status as per comment 9. Sean: Is this still high prio?
Comment 13 Sean Pringle 2014-01-09 21:57:04 UTC
Much better than it was.

I'm now trialing partitioned logging tables on one slave per production shard and directing these queries to those boxes. Since most of the queries use log_user = N, partitioning on key(log_user) allows the optimizer to use partition pruning.

Still have to track down and sort out the ones without log_user = N. They're infrequent and low priority.
Comment 14 Aaron Schulz 2014-01-09 22:08:03 UTC
How many partitions per table? So the Special:Log (no filter) case will have to UNION/resort/limit queries from each partition?
Comment 15 Sean Pringle 2014-01-10 00:06:59 UTC
16 partitions, a number chosen after some rough back-of-napkin style calculations for the larger wikis.

The effect of accessing multiple partitions is closest to UNION. I don't anticipate excessive sorting or limiting penalties for our other queries though I do want to see how range access and index scans perform.

Monitoring for queries adversely affected is definitely necessary and one of the reasons for only partitioning one slave per shard.
Comment 16 Aaron Schulz 2014-01-10 00:11:13 UTC
16 looks reasonable, I just wouldn't want something crazy like 256 ;)
Comment 17 Sean Pringle 2014-03-30 04:17:27 UTC
With partitioned slaves LogPager queries are still not super fast, but they average <10s now and no longer dominate the slow query log.

Closing this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links