Last modified: 2011-02-08 21:56:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T25682, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 23682 - CategoryTree is inefficient
CategoryTree is inefficient
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CategoryTree (Other open bugs)
unspecified
All All
: Normal enhancement with 5 votes (vote)
: ---
Assigned To: Daniel Kinzler
:
Depends on: 1211
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-27 11:55 UTC by Domas Mituzas
Modified: 2011-02-08 21:56 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Domas Mituzas 2010-05-27 11:55:55 UTC
categorytree forces scans of millions of rows (for large categories) to return 0 row datasets quite often. 

this has been discussed in the past a lot, I'll write it down yet again.

there're few fixes, one is by prefixing cl_sortkey with character not used in category names and filtering by it - would need rewriting sortindex only for subset of rows. 

another is creating separate columns, creating or replacing indexes, etc - in categorylinks table

another is maintaining subcategories table per category
Comment 1 Stepro 2010-05-27 14:35:18 UTC
In fact this feature has been disabled today: Please check if it's really necessary to disable this in all projects. Maybe this ist usefull for large projects such as wikipedias, but I don't believe this feature must be kicked off in the small ones.
Comment 2 Krinkle 2010-05-27 14:39:55 UTC
On the wiki's it is disabled it would be nice for the extension to contain some dummy code or empty string return to prevent errors all over the place.

This extension is/was widely used. And to discourage it from being unlinked everywhere (since we optimists all asume it might be effecient one day) a blank return on the wiki's it's disabled on is better then an unexpected red error message or raw return of the <categorytree /> syntax
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-05-27 14:44:06 UTC
We also want to mark subcategories specially in categorylinks so that we can page subcategories separately from actual category members.  Currently in a large category, you'd never find any subcategories, because they're on page 236 under "D" or whatever.  People hack around this by setting manual sort keys like " ".
Comment 4 Domas Mituzas 2010-05-27 14:47:27 UTC
Stepro,
small problems doing inefficient stuff means there are thousand projects doing inefficient stuff :(

Aryeh, 
sure, we could just use automatic sort key hacks too - or have separate column for this or ... (I prefer sortkey hacks :-)

Krinkle,
good idea - do you have any placeholder code for filling in extension parser tags with ""?
Comment 5 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-05-27 15:35:50 UTC
I also prefer sortkey hacks.  Much easier to deploy.
Comment 6 Tim Starling 2010-05-28 01:21:36 UTC
Thanks Domas, your point is taken. I've re-enabled CategoryTree so that we can have functional wikis while a solution is being developed.
Comment 7 Daniel Mlot 2010-05-28 01:36:14 UTC
Thanks Tim for putting common sense first and re-enabling the extension until a real solution is provided.
Comment 8 Tim Starling 2010-05-28 02:18:31 UTC
Proposed fix for the problem of expensive queries returning zero rows in r66987.
Comment 9 Domas Mituzas 2010-05-28 05:16:25 UTC
I knew it!!! :-)

zero rows is just one of cases, there are other expensive queries that return 10, 20, 30 rows - all with low selectivity. I have no idea why anyone thinks that is a feasible fix.
Comment 10 Philippe Beaudette 2010-05-28 05:29:49 UTC
I don't know the technical issues here, but from a user perspective, disabling this without notification when it's widely used across many wikis is less than ideal.  If there are performance issues, wouldn't it be best to work them out and put a change in... not just turn it off, unless there is critical reason to do so.
Comment 11 Domas Mituzas 2010-05-28 05:40:30 UTC
Philippe, developers should've known that subcategory fetches are absolutely inefficient (and features like that have been disabled quite a few times, starting with simple categories listing, then API/ajax, then this). 

Instead of fixing the root problem, some people prefer to hide this functionality in 'extensions', probably expecting to have different roll-out and less review/attention (just like with DPL :) 

OTOH, users don't announce about their million-page-sized actions beforehand, do they?
Comment 12 Tim Starling 2010-05-28 07:33:36 UTC
(In reply to comment #9)
> I knew it!!! :-)
> 
> zero rows is just one of cases, there are other expensive queries that return
> 10, 20, 30 rows - all with low selectivity. I have no idea why anyone thinks
> that is a feasible fix.

I didn't say it was a feasible fix for the whole bug, just that it was a feasible fix for the case where there are zero subcategories.
Comment 13 Domas Mituzas 2010-05-28 07:44:08 UTC
makes sense! :) Thanks Tim! This would cover some edge cases like 'Living people', until, of course, there's a single subcategory there.
Comment 14 Ilmari Karonen 2010-05-28 19:44:37 UTC
(In reply to comment #11)
I remember asking before at bug 19640 if this was an issue with CategoryTree too, but never got a response there.  Apparently it is, then.  Maybe now that Brion is no longer CTO, someone could finally implement the sortkey hack you proposed? ;)
Comment 15 Tim Starling 2010-06-01 09:09:50 UTC
I've committed r67179 now, which comes closer to a complete workaround. I'll deploy it shortly.

For a final fix, we should aim to have a carefully considered design, possibly including solutions for bug 164, bug 1211, etc.
Comment 16 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-21 21:04:49 UTC
I'll be working on this bug.  I hope to have a (proper) solution coded up within a couple of weeks.  I wrote a post to wikitech-l about it, and encourage people to respond there rather than here (since this involves several bugs):

http://lists.wikimedia.org/pipermail/wikitech-l/2010-July/048399.html
Comment 17 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-08-16 22:06:07 UTC
This should be fixed in r71174, although I don't understand CategoryTree well enough to claim that it's fixed for all uses of the extension.  Running the query that MediaWiki now produces, on a category with 1000 subcategories and 1000 other pages in the category:

mysql> SHOW STATUS LIKE 'Handler\_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 62    |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 60    |
+----------------------------+-------+
15 rows in set (0.00 sec)


mysql> SELECT  page_id,page_namespace,page_title,page_is_redirect,page_len,page_latest,cl_to,cl_from,cat_id,cat_title,cat_subcats,cat_pages,cat_files  FROM `page` JOIN `categorylinks` ON ((cl_from = page_id)) LEFT JOIN `category` ON ((cat_title = page_title AND page_namespace = 14))  WHERE cl_to = 'Test' AND cl_type = 'subcat'  ORDER BY cl_type, cl_sortkey LIMIT 200;
...snip...


mysql> SHOW STATUS LIKE 'Handler\_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 404   |
| Handler_read_next          | 199   |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 62    |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 60    |
+----------------------------+-------+
15 rows in set (0.00 sec)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links