Last modified: 2012-08-07 15:07:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 14237 - PAGESINCATEGORY should differentiate between pages and subcategories
PAGESINCATEGORY should differentiate between pages and subcategories
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
All All
: Low enhancement with 5 votes (vote)
: ---
Assigned To: Umherirrender
: 13691 15645 25376 (view as bug list)
Depends on:
  Show dependency treegraph
Reported: 2008-05-23 15:21 UTC by Peter van Londen
Modified: 2012-08-07 15:07 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Peter van Londen 2008-05-23 15:21:53 UTC
This magic word PAGESINCATEGORY, which is very useful, counts the number of articles and subcategories in a category.

I don't know if this is wished for functionality for this magic word, but it would help a lot of

a) <nowiki>{{Pagesincategory:category}}</nowiki> would not count the subcategories of a certain category
b) a new magic word is created which will count only the amount of articles in a certain category next to the existing magic word.

I did not found a bug/feature request about this, but if there is already one: sorry about this one. 

Comment 1 Melancholie 2008-05-23 16:26:25 UTC
*** Bug 13691 has been marked as a duplicate of this bug. ***
Comment 2 Peter van Londen 2008-05-23 22:00:17 UTC
It seems that bug 13691 is not an exact duplicate of this bug, although about the same magic word. That bug says it is been resolved, but then this bug is turned into a feature request for a new magic word, so that is b) in above comment.

Comment 3 Mark Redekop 2008-06-08 18:48:33 UTC
What about a new magic word {{ARTICLESINCATEGORY}} that would only display the number of mainspace pages in a category? similar to the differences between NUMBEROFPAGES NUMBEROFARTICLES 
Comment 4 Siebrand Mazeland 2008-08-17 18:17:05 UTC
I think ARTICLES should never be used in magic words. PAGES should be used instead.

I think the current behaviour is confusing. I could imagine PAGESINCATEGORY only reporting the number of pages in a category, excluding FILES and CATEGORIES. This implies there would be 4 magic words to report on either all category members (MEMBERSINCATEGORY), files in category (FILESINCATEGORY), categories in category (CATEGORIESINCATEGORY), and pages in category (PAGESCATEGORY).
Comment 5 Niklas Laxström 2008-10-06 07:03:42 UTC
*** Bug 15645 has been marked as a duplicate of this bug. ***
Comment 6 Philippe Verdy 2010-08-25 23:04:40 UTC
I'm opposed to Siebrand's view. Pages for me are any pages, including subcategories, files, talk pages, articles, project pages. They only differ by the namespace in which they reside, and there are possibly many other namespaces (don't assume that all wikis will behave like Wikipedia).

If you want to have counts be namespace, then what would be needed is a two parameter syntax like:


to make the restriction (the same magic keyword can be used, to provide separate counts for each namespace).

or even possibly like
if you want to include a list of several namespaces to include in the count.

The existing difference between NUMBEROFPAGES and NUMBEROFARTICLES does not rely on namespace differenciation but on statistical parameters (notably the page size, excluding included templates).

Introducing the term "member" will just add more confusion.
Comment 7 Chad H. 2010-09-30 15:19:12 UTC
*** Bug 21822 has been marked as a duplicate of this bug. ***
Comment 8 Chad H. 2010-09-30 15:19:21 UTC
*** Bug 25376 has been marked as a duplicate of this bug. ***
Comment 9 Chad H. 2010-09-30 15:19:59 UTC
Duping both of those bugs to this. Implementation per comment 6 (or similar) would solve all of these bugs at once.
Comment 10 Philippe Verdy 2010-09-30 22:42:08 UTC
note that multiple parameters for the syntax I propose may be reduced to just one:
where restriction may be:
- "" : no namespace id at all, useful to add namespaces
- "*" : all namespace ids (the default), useful to remove namespaces
followed by one or more of:
- "+id" : add this namespace id to the current list
- "-id" : remove this namespace id from the list

if the restriction does not start by "*" or "+" or "-", then "+" is implied
The namespace id could be either the numeric id, or a selector like "talk" to select all talk namespaces, and "subject" to select all subject namespaces.

The namespace id can then take the forms:

- an integer, the raw namespace number

- a name, a namespace name (converted to a namespace id, should recognize the synonyms, notably localized names or English names, or site-specific names)

- "odd": all odd namespace ids (i.e. "talk" namespaces associated to any subject namespace)

- "even": all even namespace ids (i.e. "subject" namespaces)

For example:

- {{PAGESINCATEGORY:categoryname|*}} : equivalent to {{PAGESINCATEGORY:categoryname|}} and to {{PAGESINCATEGORY:categoryname}} (existing syntax)

- {{PAGESINCATEGORY:categoryname|:}} : count only pages of the main namespace, that are members of the specified category name

- {{PAGESINCATEGORY:categoryname|0}} : count only pages of the main namespace, that are members of the specified category name ; equivalent to {{PAGESINCATEGORY:categoryname|+0}}

- {{PAGESINCATEGORY:categoryname|+project:+talk}} : count only pages of the "project:" or of any talk namespaces, that are members of the specified category name ; equivalent to {{PAGESINCATEGORY:categoryname|+0}}

- {{PAGESINCATEGORY:categoryname|-talk}} : count all pages of any namespace excluding the talk namespaces (odd ids) that are members of the specified category name; equivalent to {{PAGESINCATEGORY:categoryname|*-talk}}

The restriction can easily be implemented as WHERE clauses in the SQL select that will match the specified namespace ids, combined as a parenthetic list of 'OR id=value' (positive selections), followed by a list of exclusions with 'AND NOT id=value' (negative selections), and possibly with the "IN" operator if sets are available in the SQL syntax.

Some ideas about the SQL server-side cost of counting members in a specific category:

The SQL cost should with the restrictions above will be either the same (or better) as performing a select without the namespace restriction (because this is just a restriction of the existing syntax, and this should never reduce the selectivity of the SQL query, but may in fact help to improve it).

However, this means that the existing restriction (for costly parser functions) should remain (because counting pages that are members of a category, independantly of which namespace they belong may be costly in very populated categories, depending on how members of categories are indexed).

As this cost is effectively the cost of a:

 SELECT COUNT(*) from categorymembers
 WHERE category_pageid = $CATEGORYPAGEID
 AND member_namespaceid = $CATEGORYNAMESPACEID

aggregate (note: I don't know the exact schema impelementation which varies across Mediawiki versions, so replace the table names and column names appropriately), one way to solve it would be to use:

 SELECT 1 from categorymembers
 WHERE category_pageid = $CATEGORYPAGEID
 AND member_namespaceid = $CATEGORYNAMESPACEID

and then let the PHP code count the returned "1" rows: if there are 50 rows, then the category is too much populated, and COUNT(*) may take time, so the function can be considered costly. If the cost limit is reached, just return this limit value to the page calling the function, otherwise perform the same select, replacing "SELECT 1" by "SELECT COUNT(*)" (without the LIMIT clause) to return the exact value, or return the last known estimate from a separate caching aggregate table that will be updated separately (using a max timestamp of validity), to avoid reusing the same aggregate repetitively because of templated pages using this function and frequent accesses by many users viewing or editing various pages.

The value specified in the "LIMIT" clause above (here "50") may be tuned; and this first check (for performance) may be removed completely, or removed if the SQL schema includes an index that precompute aggregates for counting members in each specific category (in which case there will not be any need to perform a SELECT COUNT(*) aggregate, given that the count will be retrieved directly from a precomputed aggregate caching table, that should be updated asynchronously, either as a batch, or when the selective SELECT in the cache detects that the stored value is out of date, in which case it will perform the SELECT COUNT(*) from the non-cached table, just to update the caching table and its timestamp).
Comment 11 philippe.vigneau 2010-10-01 07:24:16 UTC
I don't know if the index on the two columns (category_pageid, member_namespaceid) exists on the table categorymembers, but it seems to me that is the only thing that may be added in the database... performance can only be better...

so when this improvment will be done ?...
Comment 12 Jarek Tuszynski 2012-02-23 12:50:17 UTC
I would love to see this one implemented. I was just looking up how to count files in a directory (excluding sub-directories) when I learn that you can not. I was hoping to use that to allow commons template [[commons:Template:MetaCat]] to list metacategories (categories which should contain only other categories) with files.
Comment 13 User:Docu 2012-06-23 07:36:59 UTC
For files, see Bug 21822
Comment 14 Umherirrender 2012-06-24 17:56:26 UTC
a patch commited with Gerrit change #12790
Comment 15 Umherirrender 2012-07-26 20:49:40 UTC
successfully merged

You can use {{PAGESINCATEGORY:catname|subcats}} or {{PAGESINCATEGORY:catname|subcats|R}} or {{PAGESINCATEGORY:catname|R|subcats}}
to get the count of subcats in the category or with 'pages' to get the count of pages.
Comment 16 Derk-Jan Hartman 2012-08-07 09:55:42 UTC
I updated the documentation, since this had not been done yet.

Note You need to log in before you can comment on or make changes to this bug.