Last modified: 2011-05-10 23:35:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21640, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19640 - Categorymembers namespace filtering is inefficient, uses ugly hack in miser mode
Categorymembers namespace filtering is inefficient, uses ugly hack in miser mode
Status: RESOLVED LATER
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
1.16.x
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Roan Kattouw
:
: 19816 20072 24354 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-10 14:09 UTC by Platonides
Modified: 2011-05-10 23:35 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch against r53286 to apply cmnamespace in php rather than sql (and rather than just ignoring it with a warning) (2.43 KB, patch)
2009-07-15 04:02 UTC, Brad Jorsch
Details

Description Platonides 2009-07-10 14:09:52 UTC
On r53052 domas disabled querying specific namespace if miser mode is enabled.

The query now produce erroneus results.
*It silently skips the page_namespace where clause instead of giving an error.
*It's not documented on api.php nor www.mediawiki.org

Either 'fix' the above issues or implement whatever domas told about how to have efficient subcategory access.
Comment 1 CBM 2009-07-10 19:19:05 UTC
This change is now live on the WMF servers, giving results that conflict with the documentation  dynamically generated by api.php. 
Comment 2 Brad Jorsch 2009-07-10 22:02:44 UTC
(In reply to comment #0)
> Either 'fix' the above issues or implement whatever domas told about how to
> have efficient subcategory access.

It would be nice if any of the rest of us could even '''know''' whatever domas told.
Comment 3 Platonides 2009-07-11 00:15:08 UTC
> It would be nice if any of the rest of us could even '''know''' whatever domas
> told.

I only know what is in r53052 comment.
Comment 4 Alex Z. 2009-07-11 00:29:39 UTC
I made a couple tweaks to the disabling in r53087 so that it would die and give an error, rather than silently ignoring it, but it hasn't yet been synced to Wikimedia.
Comment 5 Splarka 2009-07-11 00:39:11 UTC
(In reply to comment #3)
> > It would be nice if any of the rest of us could even '''know''' whatever domas
> > told.
> 
> I only know what is in r53052 comment.

From http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20090710.txt :

[19:42:35] <domas>	that was ages ago
[19:42:49] <domas>	there're few ways how to provide subcategories
[19:43:04] <domas>	it is simple
[19:43:09] <domas>	you just structure data that way

(more in the log, well... more of domas' brand of humor, heh)
Comment 6 Brad Jorsch 2009-07-11 03:17:43 UTC
(In reply to comment #5)
> From http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20090710.txt :
> 
> [19:42:35] <domas>      that was ages ago
> [19:42:49] <domas>      there're few ways how to provide subcategories
> [19:43:04] <domas>      it is simple
> [19:43:09] <domas>      you just structure data that way
> 
> (more in the log, well... more of domas' brand of humor, heh)

Isn't the data already structured "that way", i.e. each page is specifically tagged with a namespace number? And in that log, when asked about it, he says the nothing quoted above and then starts complaining about multi-language templates on commons, which seems to have nothing to do with the API's cmnamespace.

I must be missing something big in there, hopefully someone who understands this will come by to enlighten me. And I still have to wonder how something that's been in the code since '''June 2007''' is suddenly such a problem that requires breaking things in the hastiest possible way.
Comment 7 Brad Jorsch 2009-07-15 04:02:47 UTC
Created attachment 6341 [details]
Patch against r53286 to apply cmnamespace in php rather than sql (and rather than just ignoring it with a warning)

Rather than just ignoring cmnamespace (even with a warning), why not filter by cmnamespace in the PHP code that builds the result dataset when $wgMiserMode is enabled?

While this could result in the categorymembers node being empty if the requested namespaces are particularly sparse in the category, that sort of thing can happen anyway since r46845 so it's much less of a breaking change. It also would save bandwidth in the same circumstances, by not returning 5000 results the client doesn't want; when categorymembers is being used as a generator, it would also save having the generated modules querying those unwanted pages.
Comment 8 Domas Mituzas 2009-07-15 06:32:32 UTC
well, proper fix would be prefixing categories sortkey with something what sorts high up. 
brion thought it was too dirty, so the change didn't get much traction, and we have current situation.

Brad's PHP filtering is... oh... well... I won't comment ;-)
Comment 9 Roan Kattouw 2009-07-15 10:07:53 UTC
(In reply to comment #8)
> well, proper fix would be prefixing categories sortkey with something what
> sorts high up. 
> brion thought it was too dirty, so the change didn't get much traction, and we
> have current situation.
> 
> Brad's PHP filtering is... oh... well... I won't comment ;-)
> 

...a dirty, temporary hack, but good enough for now. Committed in r53304. Of course this filtering should be moved to the database side soon.
Comment 10 Brad Jorsch 2009-07-19 03:06:02 UTC
*** Bug 19816 has been marked as a duplicate of this bug. ***
Comment 11 Ilmari Karonen 2009-08-02 11:36:35 UTC
Okay, this is just FUBAR.  If we don't currently have an efficient way of getting the subcategories of a category, then one should be created.  Unlike Domas, I'm no DB expert, but I'd suggest adding a cl_fromns (?) field to the categorylinks table and creating new indexes to use it.  (Unfortunately, I don't think MySQL is smart enough to use those indexes for queries that don't involve the namespace, so we'd either have to keep some redundant indexes or use the UNION hack.  Domas can probably say which one he thinks would be the lesser evil.)  Or just use the sortkey hack that Domas originally suggested.
Comment 12 Ilmari Karonen 2009-08-02 11:42:24 UTC
By the way, doesn't the CategoryTree extension have the exact same performance problem, or is the fact that it uses memcached enough to avoid any issues there?
Comment 13 Splarka 2009-08-05 05:29:52 UTC
*** Bug 20072 has been marked as a duplicate of this bug. ***
Comment 14 Raimond Spekking 2009-08-05 13:27:11 UTC
*** Bug 20072 has been marked as a duplicate of this bug. ***
Comment 15 Ilmari Karonen 2009-08-24 23:41:43 UTC
Anyway, I'm reopening this (with a slightly changed summary) because the temporary hack applied in r53304 really is ugly and inefficient and should be replaced by a better fix (e.g. adding new fields and indexes to the categorylinks table, or using the sortkey hack suggested by Domas).
Comment 16 db [inactive,noenotif] 2010-07-13 17:56:57 UTC
*** Bug 24354 has been marked as a duplicate of this bug. ***
Comment 17 Sam Reed (reedy) 2011-01-07 02:21:23 UTC
mysql> DESCRIBE SELECT /* ApiQueryCategoryMembers::run  */  cl_from,cl_sortkey,page_namespace,page_title,page_id  FROM `mw_page`,`mw_categorylinks` FORCE INDEX (cl_sortkey) WHERE (cl_from=page_id) AND cl_to = 'Oh_rly'  ORDER BY cl_sortkey, cl_from LIMIT 11\G
+----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+
| id | select_type | table            | type   | possible_keys | key        | key_len | ref                             | rows | Extra                                    |
+----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+
|  1 | SIMPLE      | mw_categorylinks | ref    | cl_sortkey    | cl_sortkey | 257     | const                           |    2 | Using where; Using index; Using filesort |
|  1 | SIMPLE      | mw_page          | eq_ref | PRIMARY       | PRIMARY    | 4       | wikidb.mw_categorylinks.cl_from |    1 |                                          |
+----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+
2 rows in set (0.00 sec)

mysql>

Was wondering.

Just removing the order by removes the filesort ;)
Comment 18 Bryan Tong Minh 2011-01-07 08:05:21 UTC
The proper order by for the cl_sortkey index would be cl_to,cl_type,cl_sortkey
Comment 19 Sam Reed (reedy) 2011-01-07 18:21:37 UTC
IF that is all that's required to fix it... It's worth adding an index (though, I've got a feeling adding the index might be quite slow especially on wmf et al. And is it still right post the sorting changes?), and backporting...
Comment 20 Roan Kattouw 2011-01-07 19:43:44 UTC
(In reply to comment #19)
> IF that is all that's required to fix it... It's worth adding an index (though,
> I've got a feeling adding the index might be quite slow especially on wmf et
> al. And is it still right post the sorting changes?), and backporting...
That's not the problem. The problem is that you have to join in the page table to filter by namespace, because the namespace isn't in the categorylinks table.
Comment 21 p858snake 2011-04-30 00:10:14 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 22 Sam Reed (reedy) 2011-05-10 23:35:02 UTC
RESOLVED LATER

There's nothing actionable by the API at the moment

If there does become something in future, feel free to reopen the bug at that point

Do we have a tracking bug for that or something?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links