Last modified: 2014-09-23 23:31:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16786, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14786 - Special:WantedPages should use simpler query (so that it can be enabled on WMF wikis)
Special:WantedPages should use simpler query (so that it can be enabled on WM...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
All All
: Low enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
Blocks: 39661 1861
  Show dependency treegraph
Reported: 2008-07-10 22:53 UTC by darklama
Modified: 2014-09-23 23:31 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

simplify query for faster performance (3.01 KB, patch)
2008-07-10 22:53 UTC, darklama

Description darklama 2008-07-10 22:53:24 UTC
Created attachment 5066 [details]
simplify query for faster performance

Updates for Special:WantedPage have been disabled on en.wikibooks. I tried asking on #mediawiki-tech, yesterday and was told it was probably due to being inefficient. Since updates to Special:WantedCategories are still being allowed currently, I propose simplifying Special:WantedPages since that would appear to be the reason.

I have included an attached patch which I believe does simplify the query and reduces the performance impact, without reducing functionality.
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-07-11 16:49:52 UTC
The patch 1) functionally changes the query, apparently unintentionally; and 2) includes tons of completely unrelated changes that I'm not even going to look at.  It might improve performance, but only by accident and by introducing bugs.

This query should be okay with correct join order.  On MySQL 5.x I'm seeing that with STRAIGHT_JOIN, it's 

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: pagelinks
         type: index
possible_keys: pl_from,pl_namespace
          key: pl_namespace
      key_len: 265
          ref: NULL
         rows: 254989417
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: name_title
          key: name_title
      key_len: 261
          ref: enwiki.pagelinks.pl_namespace,enwiki.pagelinks.pl_title
         rows: 1
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY,name_title
          key: PRIMARY
      key_len: 4
          ref: enwiki.pagelinks.pl_from
         rows: 1
        Extra: Using where

So just a table scan of pagelinks, not a filesort.  This should run in under 15 minutes even on enwiki, hopefully, and surely on enwikibooks.  IIRC, that was the criterion Tim used for shutting off queries.  I'm trying it out on the toolserver now.

Without the STRAIGHT_JOIN, as now, MySQL 5 tries to put pg2 first in the join order, for some ungodly reason.
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-07-11 17:01:50 UTC
Oops, I forgot about the ORDER BY.  That adds the filesort again.  If update_specialpages.php could be rewritten to write an unsorted result to disk, close the transaction, sort the result on disk, and cache that, it wouldn't have to keep open horribly long database transactions and might be more useful.

Even with the order by and without the straight join, actually, this runs in less than a minute on the toolserver for enwikibooks.  Was this really taking a long time on the real servers?  This query takes 16.90s on toolserver's enwikibooks_p:

EXPLAIN SELECT 'Wantedpages' as type, pl_namespace AS namespace,  pl_title AS title, COUNT(*) AS value FROM pagelinks LEFT JOIN page AS pg1 ON pl_namespace = pg1.page_namespace AND pl_title = pg1.page_title LEFT JOIN page AS pg2 ON pl_from = pg2.page_id WHERE pg1.page_namespace IS NULL AND pl_namespace NOT IN ( 2, 3 ) AND pg2.page_namespace != 8 GROUP BY 1,2,3 HAVING COUNT(*) > 0 ORDER BY value DESC LIMIT 5000;

Maybe MySQL 4 is optimizing it poorly?
Comment 3 darklama 2008-07-11 18:15:58 UTC
Isn't MediaWiki suppose to work with both MySQL and PostgreSQL? I don't think PostgreSQL supports STRAIGHT_JOIN, which is why I didn't use that.
Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-07-11 18:59:11 UTC
MySQL conditional comments can be used to hide MySQL-specific things:

The abstraction layer can also be used.  Either way, it's not a problem.
Comment 5 Chad H. 2008-12-16 16:13:34 UTC
Fwiw: Removing the second JOIN to remove links from MW messages seems to get rid of the filesort.

I know that was added (bug 5723) to prevent listing of $1, $2 in messages (which showed up as wanted). However, removing the join doesn't seem to be adding these fake links to the results. Has something else changed to keep them out of link tables, rendering this extra join useless?
Comment 6 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-12-16 16:34:33 UTC
Removing the extra join would serve more or less the same purpose as using STRAIGHT_JOIN.  Both should remove the filesort, if what I said above is correct.
Comment 7 Danny B. 2009-09-23 22:25:18 UTC
Bug 15434 related.
Comment 8 Sumana Harihareswara 2011-11-10 06:02:28 UTC
-patch since there is currently no patch.

Note You need to log in before you can comment on or make changes to this bug.