Last modified: 2011-05-10 23:35:02 UTC
On r53052 domas disabled querying specific namespace if miser mode is enabled. The query now produce erroneus results. *It silently skips the page_namespace where clause instead of giving an error. *It's not documented on api.php nor www.mediawiki.org Either 'fix' the above issues or implement whatever domas told about how to have efficient subcategory access.
This change is now live on the WMF servers, giving results that conflict with the documentation dynamically generated by api.php.
(In reply to comment #0) > Either 'fix' the above issues or implement whatever domas told about how to > have efficient subcategory access. It would be nice if any of the rest of us could even '''know''' whatever domas told.
> It would be nice if any of the rest of us could even '''know''' whatever domas > told. I only know what is in r53052 comment.
I made a couple tweaks to the disabling in r53087 so that it would die and give an error, rather than silently ignoring it, but it hasn't yet been synced to Wikimedia.
(In reply to comment #3) > > It would be nice if any of the rest of us could even '''know''' whatever domas > > told. > > I only know what is in r53052 comment. From http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20090710.txt : [19:42:35] <domas> that was ages ago [19:42:49] <domas> there're few ways how to provide subcategories [19:43:04] <domas> it is simple [19:43:09] <domas> you just structure data that way (more in the log, well... more of domas' brand of humor, heh)
(In reply to comment #5) > From http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20090710.txt : > > [19:42:35] <domas> that was ages ago > [19:42:49] <domas> there're few ways how to provide subcategories > [19:43:04] <domas> it is simple > [19:43:09] <domas> you just structure data that way > > (more in the log, well... more of domas' brand of humor, heh) Isn't the data already structured "that way", i.e. each page is specifically tagged with a namespace number? And in that log, when asked about it, he says the nothing quoted above and then starts complaining about multi-language templates on commons, which seems to have nothing to do with the API's cmnamespace. I must be missing something big in there, hopefully someone who understands this will come by to enlighten me. And I still have to wonder how something that's been in the code since '''June 2007''' is suddenly such a problem that requires breaking things in the hastiest possible way.
Created attachment 6341 [details] Patch against r53286 to apply cmnamespace in php rather than sql (and rather than just ignoring it with a warning) Rather than just ignoring cmnamespace (even with a warning), why not filter by cmnamespace in the PHP code that builds the result dataset when $wgMiserMode is enabled? While this could result in the categorymembers node being empty if the requested namespaces are particularly sparse in the category, that sort of thing can happen anyway since r46845 so it's much less of a breaking change. It also would save bandwidth in the same circumstances, by not returning 5000 results the client doesn't want; when categorymembers is being used as a generator, it would also save having the generated modules querying those unwanted pages.
well, proper fix would be prefixing categories sortkey with something what sorts high up. brion thought it was too dirty, so the change didn't get much traction, and we have current situation. Brad's PHP filtering is... oh... well... I won't comment ;-)
(In reply to comment #8) > well, proper fix would be prefixing categories sortkey with something what > sorts high up. > brion thought it was too dirty, so the change didn't get much traction, and we > have current situation. > > Brad's PHP filtering is... oh... well... I won't comment ;-) > ...a dirty, temporary hack, but good enough for now. Committed in r53304. Of course this filtering should be moved to the database side soon.
*** Bug 19816 has been marked as a duplicate of this bug. ***
Okay, this is just FUBAR. If we don't currently have an efficient way of getting the subcategories of a category, then one should be created. Unlike Domas, I'm no DB expert, but I'd suggest adding a cl_fromns (?) field to the categorylinks table and creating new indexes to use it. (Unfortunately, I don't think MySQL is smart enough to use those indexes for queries that don't involve the namespace, so we'd either have to keep some redundant indexes or use the UNION hack. Domas can probably say which one he thinks would be the lesser evil.) Or just use the sortkey hack that Domas originally suggested.
By the way, doesn't the CategoryTree extension have the exact same performance problem, or is the fact that it uses memcached enough to avoid any issues there?
*** Bug 20072 has been marked as a duplicate of this bug. ***
Anyway, I'm reopening this (with a slightly changed summary) because the temporary hack applied in r53304 really is ugly and inefficient and should be replaced by a better fix (e.g. adding new fields and indexes to the categorylinks table, or using the sortkey hack suggested by Domas).
*** Bug 24354 has been marked as a duplicate of this bug. ***
mysql> DESCRIBE SELECT /* ApiQueryCategoryMembers::run */ cl_from,cl_sortkey,page_namespace,page_title,page_id FROM `mw_page`,`mw_categorylinks` FORCE INDEX (cl_sortkey) WHERE (cl_from=page_id) AND cl_to = 'Oh_rly' ORDER BY cl_sortkey, cl_from LIMIT 11\G +----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+ | 1 | SIMPLE | mw_categorylinks | ref | cl_sortkey | cl_sortkey | 257 | const | 2 | Using where; Using index; Using filesort | | 1 | SIMPLE | mw_page | eq_ref | PRIMARY | PRIMARY | 4 | wikidb.mw_categorylinks.cl_from | 1 | | +----+-------------+------------------+--------+---------------+------------+---------+---------------------------------+------+------------------------------------------+ 2 rows in set (0.00 sec) mysql> Was wondering. Just removing the order by removes the filesort ;)
The proper order by for the cl_sortkey index would be cl_to,cl_type,cl_sortkey
IF that is all that's required to fix it... It's worth adding an index (though, I've got a feeling adding the index might be quite slow especially on wmf et al. And is it still right post the sorting changes?), and backporting...
(In reply to comment #19) > IF that is all that's required to fix it... It's worth adding an index (though, > I've got a feeling adding the index might be quite slow especially on wmf et > al. And is it still right post the sorting changes?), and backporting... That's not the problem. The problem is that you have to join in the page table to filter by namespace, because the namespace isn't in the categorylinks table.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
RESOLVED LATER There's nothing actionable by the API at the moment If there does become something in future, feel free to reopen the bug at that point Do we have a tracking bug for that or something?