Last modified: 2008-04-11 07:20:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T8943, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 6943 - Magic word for number of items in a category
Magic word for number of items in a category
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
unspecified
All All
: Normal enhancement with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-need-review
: 4540 (view as bug list)
Depends on:
Blocks: 13326
  Show dependency treegraph
 
Reported: 2006-08-08 00:11 UTC by Ernest Cline
Modified: 2008-04-11 07:20 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
implemented PAGESINCATEGORY magicword (3.25 KB, patch)
2006-12-22 02:32 UTC, Jmax
Details
non-recursive PAGESINCATEGORY magicword (2.93 KB, patch)
2006-12-22 02:54 UTC, Jmax
Details
PAGESINCATEGORY using category table (3.05 KB, patch)
2008-03-21 02:46 UTC, Alex Z.
Details
PAGESINCATEGORY using category table (3.06 KB, patch)
2008-03-21 16:47 UTC, Alex Z.
Details
Yet another patch (2.92 KB, patch)
2008-03-22 17:46 UTC, Alex Z.
Details
Patch with limit (3.23 KB, patch)
2008-03-23 02:32 UTC, Alex Z.
Details
Updated patch (9.33 KB, patch)
2008-03-26 19:11 UTC, Alex Z.
Details
Slight fix (9.27 KB, patch)
2008-03-26 19:15 UTC, Alex Z.
Details
Per Brion (9.33 KB, patch)
2008-04-07 21:42 UTC, Alex Z.
Details

Description Ernest Cline 2006-08-08 00:11:25 UTC
It would be very handy for things like Tables of Contents, if there was a way
for them to be rendered only if the number of articles and subcategories was
sufficent to cause the the category to span more than one page.  If there was a
variable for the number of items (and or pages) in a category, then the existing
#if functionality would enable this to be done without any other changes.
Comment 1 Jmax 2006-12-22 02:30:51 UTC
For what it's worth, this feature could be used in many applications, including
keeping the count of featured articles on the main page (on en wikipedia -- this
is currently implemented using a bot I coded, en:User:Jmax-bot).

I will attach a patch that implements this.  It is a bit slow, as it is
recursive.  But, it seems to work fine on my machine (p4-class celeron, 1G ram,
low load).  I'm not sure that I've implemented it perfectly, as I'm still
getting used to the mediawiki framework.  Please be gentle!
Comment 2 Jmax 2006-12-22 02:32:08 UTC
Created attachment 2918 [details]
implemented PAGESINCATEGORY magicword
Comment 3 Jmax 2006-12-22 02:54:50 UTC
Created attachment 2919 [details]
non-recursive PAGESINCATEGORY magicword

Updated patch -- this one does *not* recurse (and actually works, heh)
Comment 4 Alessandro Argentini 2007-01-08 17:30:09 UTC
*** Bug 4540 has been marked as a duplicate of this bug. ***
Comment 5 Alessandro Argentini 2007-01-08 17:50:27 UTC
I have a question. Does this magicword sum the number of articles in the
subcategories of the selected category?

For example in this case
10 pages with ONLY the category [[Category:Coats of arms]]
10 pages with ONLY the category [[Category:Blasons]]
[[Category:Blasons]] is a subcategory of [[Category:Coats of arms]]

the {{PAGESINCATEGORY}} magicword value is 10 or 20?

If articles in subcategories are not counted, it would have some sense to
implement another magicworld with the total sum, something like
{{ALL_PAGES_IN_CATEGORY}}.
Comment 6 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-01-08 18:00:01 UTC
"Non-recursive" = "doesn't count subcategories".
Comment 7 Ernest Cline 2007-01-11 05:47:00 UTC
The problem with a recursive version would be what to do with a category loop? 
Detecting loops would likely be expensive.
Comment 8 Dan Bolser 2007-05-31 12:09:14 UTC
> The problem with a recursive version would be what to do with a category loop? 
> Detecting loops would likely be expensive.

Hash map?
Comment 9 Alex Z. 2008-03-20 17:41:22 UTC
This is much easier now that there is a category table that includes a count of all the pages in each category.
Comment 10 Alex Z. 2008-03-21 02:46:15 UTC
Created attachment 4741 [details]
PAGESINCATEGORY using category table

New patch that uses the cat_pages field in the category table.
Comment 11 Alex Z. 2008-03-21 16:47:54 UTC
Created attachment 4743 [details]
 PAGESINCATEGORY using category table

Slight fix to previous patch for pages defined in category table but not containing any pages, now correctly returns 0 instead of the nosuchcategory message.
Comment 12 Alex Z. 2008-03-22 17:46:47 UTC
Created attachment 4748 [details]
Yet another patch

This time using existing functions in the Category class to get the page count, possibly more efficient than previous patches that used the selectField wrapper to query the database (if it isn't, use patch number 4).
Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-03-23 01:28:04 UTC
They should be the same, basically, just the Category wrapper makes it more readable.  A couple of comments:

1) This is another one of those nasty parser tags that does a DB query for every time it's included in the page.  If no system is devised for batching this sort of thing so it's one query per page render, or at least one query per preprocessor pass, I'd be really leery about committing more stuff like this.  We have hacky limits on the #ifexists parser functions or whatever it's called; this should at least have similar to avoid someone making a template that uses it 4500 times and does 4500 queries (albeit very fast queries).

2) Use hyphens to separate words in new message names: no-such-category, not nosuchcategory.
Comment 14 Alex Z. 2008-03-23 02:32:57 UTC
Created attachment 4750 [details]
Patch with limit

Sets a limit on PAGESINCATEGORY calls.
Comment 15 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-03-23 14:40:43 UTC
+		global $wgPICcount, $wgMaxPICcount;

What are these globals supposed to do?  They should be added to DefaultSettings.php with documentation and default values.  Don't try to use variables without initializing them.  You should be using error_reporting( E_ALL | E_STRICT ) when doing development, and fixing any printouts you get.

For $wgPICcount, I'd go more in line with r27946 and make it a Parser member variable that's cleared on clearState().  Currently it's per-request, so for instance it would be incorrectly high if a job is being run along with the current request.  r27977 also showcases a good thing to add here, as do some later commits to that.

Actually, what I would do is make the current ifexist-specific-limit into a global parser variable like mExpensiveFunctionCount, and have both ifexist and this increment it independently.  Otherwise the limit of 100 simple queries that should really be combined has now crept up to 200, and then 300, 400, ...  The code for ifexist limit tracking is more complicated but much more complete and useful: it provides warnings on preview if you go over the limit, things like that.  It would be best if that could be moved into some central Parser methods that all expensive functions can share.

+		if (!$wgMaxPICcount) {
+			$wgMaxPICcount = 100;
+		}

Never do this in PHP < 6.  Initialization needs to be unconditional.  Otherwise you're opening yourself up to register_globals vulnerabilities: if register_globals in on and $wgMaxPICcount is uninitialized, remote parties can initialize it by just setting &wgMaxPICcount=1000000000 in the query URL.  (And if you are going to do it, use isset() rather than checking the variable directly, to avoid an E_NOTICE.  This also allows people to set the limit to 0 if they want.)

+			$count = $category->getPageCount();
+			if ($count == '') {
+				return wfMsgForContent('no-such-category');
+			}

Checking for $count == '' is a little odd, since $count will possibly be false and possibly be 0, but never the empty string.  !$count would be a more sensible way to write that.  Also, is this the best way to handle the error?  It seems like returning 0 would make the most sense in both cases (there are, after all, 0 members of the category).  So perhaps

			if( $count === false ) {
				$count = 0;
			}

(This seems to highlight a flaw in the design of the Category methods: you have no way to tell the difference between a category that happens not to exist, and one that cannot possibly exist because the name is invalid.  I should fix that.)

+		return wfMsgForContent('too-many-pagesincategory', $wgMaxPICcount);

I was going to say this should be 'too-many-pages-in-category', but of course you're correct, it shouldn't be.  :)


Overall it looks okay with these fixed.  I'd be willing to commit it, barring disapproval from some Higher Power.
Comment 16 Alex Z. 2008-03-23 17:55:34 UTC
I'll do the fixes later, probably tomorrow. If I recall correctly, I used "if ($count == '')" to differentiate between categories that currently have no pages in them but otherwise exist (or are at least defined in the category table) and categories don't actually exist. if ($count === false) will give the same results, so I'll use that.
Comment 17 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-03-23 18:30:12 UTC
(In reply to comment #16)
> I'll do the fixes later, probably tomorrow. If I recall correctly, I used "if
> ($count == '')" to differentiate between categories that currently have no
> pages in them but otherwise exist (or are at least defined in the category
> table) and categories don't actually exist.

That doesn't differentiate between them.  Remember that in PHP, 0 == '' and '' == false, so it will be true for both scenarios.

I wouldn't distinguish between the two cases, personally.  The distinction is a little arcane.  A category with 0 pages in it should be treated as nonexistent for most purposes; this is what we've historically done.  The category row gets left there instead of being deleted because otherwise you'd be giving the same category different ID's for no real reason, as members are removed and re-added, instead of preserving the ID.
Comment 18 Alex Z. 2008-03-26 19:11:34 UTC
Created attachment 4769 [details]
Updated patch

*Now uses Parser member variable mExpensiveFunctionCount

*Changes also to ParserFunctions extension to use mExpensiveFunctionCount instead of ifexist-specific limit.

*Global $wgExpensiveParserFunctionLimit, set by default at 100

*Moves the on-preview over-limit warning, category, and limit report out of the extension and into Parser.php and MessagesEn (changed to say "expensive parser functions" instead of #ifexist).

*PAGESINCATEGORY now returns 0 if no title is passed, a non-existent category is given, or the limit is exceeded. This is close to how ifexist works (returns the "else" text if the limit is reached and could help if its used with other parser functions - {{#ifexpr:{{PAGESINCATEGORY:{{{1|}}} > ... }} won't give a big red expression error if the limit is exceeded or an incorrect title is used.
Comment 19 Alex Z. 2008-03-26 19:15:54 UTC
Created attachment 4770 [details]
Slight fix

Slight cosmetic fix from previous patch (had a line commented out from testing I forgot to remove).
Comment 20 Alex Z. 2008-04-07 21:42:35 UTC
Created attachment 4793 [details]
Per Brion

Per Brion in IRC:
if (!$category) {
  return 0;

will return 0 for Category:0
Comment 21 Brion Vibber 2008-04-07 22:11:58 UTC
Looking good! Applied on r32932.
Comment 22 Melancholie 2008-04-11 07:20:05 UTC
Please see [[Bug 13691]].
Comment 23 Melancholie 2008-04-11 07:20:39 UTC
Bug 13691 (this way?)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links