Last modified: 2008-07-30 03:25:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3411, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 1411 - An extension to create list of most recent articles in 1-3 categories.
An extension to create list of most recent articles in 1-3 categories.
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Extensions requests (Other open bugs)
unspecified
All All
: Normal enhancement with 11 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.ilya.us/wiki/index.php?tit...
: patch
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-26 20:49 UTC by Amgine
Modified: 2008-07-30 03:25 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
The extension, updated 1-jan-05. (2.45 KB, patch)
2005-01-27 03:22 UTC, Amgine
Details
Cleaned up, simplified, renamed to avoid conflict (4.48 KB, patch)
2005-02-04 17:43 UTC, Amgine
Details
Updated with Amgine's optimizations merged in (4.54 KB, text/plain)
2005-02-06 01:04 UTC, IlyaHaykinson
Details
Updated with some fixes (4.97 KB, patch)
2005-02-09 01:30 UTC, Brion Vibber
Details
Moved variables to LocalSettings.php (5.21 KB, patch)
2005-02-09 05:11 UTC, Amgine
Details
Correct version - with the changes implemented and not just doodling... (5.25 KB, patch)
2005-02-09 05:20 UTC, Amgine
Details
version 2.0 (16.92 KB, text/plain)
2005-12-02 21:35 UTC, Amgine
Details
Diff version 1.9, 2.0 (32.17 KB, patch)
2005-12-02 22:14 UTC, Amgine
Details

Description Amgine 2005-01-26 20:49:13 UTC
An extension to sort articles for the most recent n articles which are in
category x [AND category y [AND category z]].

The extension is vital to automate the main page of Wikinews, to allow
contributors to concentrate on producing articles rather than maintaining a very
rapidly aging news list. It can allow the most current versions of articles to
appear while they are news. Additional applications exist for Wikipedias.

This version has been tested on small-scale installations of mediawiki, 1.4b3
and 1.4b5.

Source:

<pre>
<?php
/*

 Contributors:  n:User:Amgine, n:User:IlyaHaykinson

 To install:    add following to LocalSettings.php
   include("extensions/dynamicpagelist.php");

*/

$wgExtensionFunctions[] = "wfDynamicPageList";

function wfDynamicPageList() {
    global $wgParser;

    $wgParser->setHook( "DynamicPageList", "DynamicPageList" );
}

# The callback function for converting the input text to HTML output           
                                      
function DynamicPageList( $input ) {
    global $wgScriptPath, $wgServer, $wgUser;

    $aParams = array();

    $sTok = strtok($input, "\n");
    while ($sTok)
    {
      $aParams[] = $sTok;
      $sTok = strtok("\n");
    }

    foreach($aParams as $sParam)
    {
      $aParam = explode("=", $sParam);
      $sType = $aParam[0];
      $sArg = $aParam[1];
      if ($sType == 'category')
      {
        $sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg);
        $sCatName = str_replace('\'','\\\'',$sCatName);
        $aCategories[] = $sCatName;
      }
      else if ('count' == $sType)
      {
        $iCount = (1 * $sArg);
      }
    }

    $iCatCount = count($aCategories);
    if ($iCatCount < 1)
      return "!!too few categories!!";
    if ($iCatCount > 3)
      return "!!too many categories!!";
    if ($iCount < 1)
      $iCount = 1;
    if ($iCount > 50)
      $iCount = 50;

    $sSql = 'SELECT cur_namespace, cur_title FROM cur';
    for ($i = 0; $i < $iCatCount; $i++) {
      $sSql .= ', categorylinks AS c' . ($i+1);
    }

    $sSql .= ' WHERE 1=1 ';

    for ($i = 0; $i < $iCatCount; $i++) {
      if ($i > 0)
        $sSql .= ' AND c1.cl_from = c'.($i+1).'.cl_from';
      $sSql .= ' AND c'.($i+1).'.cl_to = \''.$aCategories[$i].'\'';
    }

    $sSql .= ' AND cur_id = c1.cl_from ORDER BY cur_timestamp DESC LIMIT 0,' .
$iCount;

    //$output .= $sSql . "<br />";                                             
                                      

    # process the query                                                        
                                      
    $res = wfQuery($sSql, DB_READ);

    $sk =& $wgUser->getSkin();

    $output .= "<ul>\n";

    # process results of query                                                 
                                      
    while ($row = wfFetchObject( $res ) ) {
        $title = Title::makeTitle( $row->cur_namespace, $row->cur_title);
        $output .= '<li>' . $sk->makeLinkObj($title) . '</li>' . "\n";
    }
    $output .= "</ul>\n";

    return $output;
}
?>
</pre>
Comment 1 Amgine 2005-01-27 03:22:52 UTC
Created attachment 230 [details]
The extension, updated 1-jan-05.

The extension has been tested on Mediawiki 1.4b3 and 1.4b5
Comment 2 JeLuF 2005-01-27 05:52:13 UTC
I think this line

  $sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg);

would break in UTF8 wikis. PHP considers e.g. 0xA0 as whitespace, but 0xA0 is
also the second byte
of the cyrillic character P. Do you really need "any whitespace" here, or just
"blank"?
Comment 3 IlyaHaykinson 2005-01-27 08:14:30 UTC
(In reply to comment #2)
> I think this line
> 
>   $sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg);
> 
> would break in UTF8 wikis. PHP considers e.g. 0xA0 as whitespace, but 0xA0 is
> also the second byte
> of the cyrillic character P. Do you really need "any whitespace" here, or just
> "blank"?

Good point. Both preg_replace lines are now instead:

        $title = Title::makeTitle('',$sArg);
        $sCatName = wfStrencode($title->getDbKey(), DB_READ);

updated version at http://www.ilya.us/wiki/index.php?title=DynamicPageList_extension
Comment 4 JeLuF 2005-01-27 10:51:59 UTC
Some more remarks:

  if ($sType == 'category')
   ...
  else if ('count' == $sType)

Could you use MagicWord's for these? Some languages prefer to use localized
versions of those tags, e.g. when categories aren't named "Category"  (Perhaps
we should extend the extension framework to allow localized versions of tags, too??)

  return "!!too few categories!!";

Use wfMsg for any output strings, so that the messages can be translated.

  if ($iCatCount > 3)

I don't like that hardcoded limit. I'm not sure we need a limit, will benchmark
this tonight. If we need it, it should be an option.

PS: I changed the severity to "enhancement", since this is a new feature.
Comment 5 IlyaHaykinson 2005-02-01 20:26:08 UTC
(In reply to comment #4)
> Some more remarks:
> 
>   if ($sType == 'category')
>    ...
>   else if ('count' == $sType)
> 
> Could you use MagicWord's for these? Some languages prefer to use localized
> versions of those tags, e.g. when categories aren't named "Category"  (Perhaps
> we should extend the extension framework to allow localized versions of tags,
too??)

I disagree for three reasons. 

1. The current magic word architecture isn't meant for extensions as much as
pages and the Parser.
2. "category" and "count" are internal commands for the extension only. the
namespace for category can still be localize in the wiki (i.e. category=blah
will map to the category named blah no matter what the namespace is)
3. Other extensions (i.e. easytimeline, which is deployed on Wikipedia) also do
not localize their internal commands.

>   return "!!too few categories!!";
> 
> Use wfMsg for any output strings, so that the messages can be translated.
> 

Done.

>   if ($iCatCount > 3)
> 
> I don't like that hardcoded limit. I'm not sure we need a limit, will benchmark
> this tonight. If we need it, it should be an option.

Done. Now uses parameters near the top of the function. I still think it's
better to have a limit, otherwise we run the risk of DOS via queries with
immense amounts of joins.

I've updated the code on the web site. I urge those with proper powers to please
deploy this asap if there are no more objections.
Comment 6 IlyaHaykinson 2005-02-01 20:27:58 UTC
Comment on attachment 230 [details]
The extension, updated 1-jan-05.

<?php
/*									        

 Purpose:	outputs a bulleted list of most recent			        
		items residing in a category, or a union		        
		of several categories.					        

 Contributors:	n:User:Amgine, n:User:IlyaHaykinson			        

 To install:	add following to LocalSettings.php			        
   include("extensions/dynamicpagelist.php");				        

*/

$wgExtensionFunctions[] = "wfDynamicPageList";


function wfDynamicPageList() {
    global $wgParser, $wgMessageCache;

    $wgMessageCache->addMessages( array(
					'dynamicpagelist_toomanycats' =>
'DynamicPageList: Too many categories!',
					'dynamicpagelist_toofewcats' =>
'DynamicPageList: Too few categories!'
					)
				  );

    $wgParser->setHook( "DynamicPageList", "DynamicPageList" );
}

// The callback function for converting the input text to HTML output	        
function DynamicPageList( $input ) {
    global $wgScriptPath, $wgServer, $wgUser;

    // parameters							        

    //minimum and maximum number of category unions			        
    $iMinCategories = 1;
    $iMaxCategories = 3;

    //minimum and maximum number of results allowed.			        
    $iMinResultCount = 1;
    $iMaxResultCount = 50;

    //whether unlimited results are allowed (when count is ommitted)	        
    $bAllowUnlimitedResults = true;

    // end params							        


    $aParams = array();
    $bCountSet = false;

    $sTok = strtok($input, "\n");
    while ($sTok)
    {
      $aParams[] = $sTok;
      $sTok = strtok("\n");
    }

    foreach($aParams as $sParam)
    {
      $aParam = explode("=", $sParam);
      $sType = $aParam[0];
      $sArg = $aParam[1];
      if ($sType == 'category')
      {
	$title = Title::makeTitle('',$sArg);
	$sCatName = wfStrencode($title->getDbKey(), DB_READ);
	$aCategories[] = $sCatName;
      }
      else if ('count' == $sType)
      {
	//ensure that $iCount is a number;				        
	$iCount = (1 * $sArg);
	$bCountSet = true;
      }
    }

    $iCatCount = count($aCategories);
    if ($iCatCount < $iMinCategories)
      return wfMsg( 'dynamicpagelist_toofewcats' ); // "!!too few
categories!!";					    
    if ($iCatCount > $iMaxCategories)
      return wfMsg( 'dynamicpagelist_toomanycats' ); // "!!too many
categories!!";					  

    if (true == $bCountSet)
    {
      if ($iCount < $iMinResultCount)
	$iCount = $iMinResultCount;
      if ($iCount > $iMaxResultCount)
	$iCount = $iMaxResultCount;
    }
    else
    {
      if (false == $bAllowUnlimitedResults)
      {
	$iCount = $iMaxResultCount;
	$bCountSet = true;
      }
    }


    //build the SQL query						        

    $sSql = 'SELECT cur_namespace, cur_title FROM cur';
    for ($i = 0; $i < $iCatCount; $i++) {
      $sSql .= ', categorylinks AS c' . ($i+1);
    }

    $sSql .= ' WHERE 1=1 ';

    for ($i = 0; $i < $iCatCount; $i++) {
      if ($i > 0)
	$sSql .= ' AND c1.cl_from = c'.($i+1).'.cl_from';
      $sSql .= ' AND c'.($i+1).'.cl_to = \''.$aCategories[$i].'\'';
    }

    $sSql .= ' AND cur_id = c1.cl_from ORDER BY cur_timestamp DESC';

    if (true == $bCountSet)
    {
      $sSql .= ' LIMIT 0,' . $iCount;
    }

    //DEBUG: output SQL query						        
    //$output .= $sSql . "<br />";					        

    // process the query						        
    $res = wfQuery($sSql, DB_READ);

    $sk =& $wgUser->getSkin();

    //start unordered list						        
    $output .= "<ul>\n";

    //process results of query, outputing equivalent of <li>[[Article]]</li>
for each result 			 
    while ($row = wfFetchObject( $res ) ) {
	$title = Title::makeTitle( $row->cur_namespace, $row->cur_title);
	$output .= '<li>' . $sk->makeLinkObj($title) . '</li>' . "\n";
    }

    //end unordered list						        
    $output .= "</ul>\n";

    return $output;
}
?>
Comment 7 Brion Vibber 2005-02-02 07:10:25 UTC
Please don't paste large amounts of code into comments. It's hard to read,
clutters up the page, and doesn't get formatted correctly. Use the 'Create
attachment' link.
Comment 8 Amgine 2005-02-04 17:43:54 UTC
Created attachment 259 [details]
Cleaned up, simplified, renamed to avoid conflict

This is a rewrite to simplify and speed up the previous version. The extension
is renamed as a kluge so I could compare it with the previous on my test
installation.

This version includes:

configurable maximum category searches (may be configured for unlimited,
default 3)
configurable maximum return articles (may be configured to allow unlimited,
default 5)
Comment 9 IlyaHaykinson 2005-02-06 01:04:00 UTC
Created attachment 267 [details]
Updated with Amgine's optimizations merged in

Incorporated Amgine's optimizations. Kept greater configurability (min/max
categories, min/max results, and unlimited categories and results are all
parameters).
Comment 10 Brion Vibber 2005-02-09 01:30:31 UTC
Created attachment 275 [details]
Updated with some fixes

I've made a few fixes:

* If a line doesn't contain a "=", avoid triggering a notice error for unset
variables in $aParam[1]
* Title::makeTitle is not safe for user-provided data as it does no sanitizing;
use Title::newFromText. (This also will allow people to write 'Category:Foo'
and get the expected thing.)
* If invalid, $title will be null and the getDbKey() call can fail, so skip the
line.
* Avoid encoding the dbkey so early, it makes the code harder to follow.
* Use IntVal instead of 1 * $sArg. (Note that multiplication will allow float
values; this may be harmless here but is not really what we meant.)
* Fixed some tabs -- when using spaces for indentation, try to avoid mixing
tabs as the tab size may not be the same for all editors and it uglifies the
code.
* Extensions return raw HTML; avoid returning a raw message as this allows
careless or rogue sysops or hijacked sysop accounts to break the wiki (invalid
HTML) or create security risks (JavaScript exploits etc). Use
htmlspecialchars() to force plaintext, or run the message through the wiki
parser. (Used htmlspecialchars for now.)
* Use booleans as booleans rather than false == and true ==, for readability
and consistency.
* Switched database calls to the new OO functions.
* In MediaWiki 1.4 tables may have a configurable prefix; get the canonical
name with $dbr->tableName().
* For readability, move the 'cur_id=cl.cl_from' clause to the top and eliminate
the 1=1 clause.
* LIMIT N,N is a MySQL-ism. Since no offset is needed, just use LIMIT N for
portability (use $dbr->limitResults() if needed).
* If there are no results, the <ul></ul> produced doesn't validate (a list must
contain at least one item). Return an error message instead.
* Initialize $output before using it, or a notice is thrown when error
reporting is put up high.
* Use $sk->makeKnownLinkObj() instead of $sk->makeLinkObj(). We know the pages
exist, so we can avoid hitting the database again to check.

You might also consider making the configuration options settable from
LocalSettings.php, so the extension code doesn't have to be altered.
Comment 11 Amgine 2005-02-09 05:11:26 UTC
Created attachment 277 [details]
Moved variables to LocalSettings.php

Moves configuration variables to LocalSettings.php
Comment 12 Amgine 2005-02-09 05:20:09 UTC
Created attachment 278 [details]
Correct version - with the changes implemented and not just doodling...

Correctly implemented the parameter variables.
Comment 13 Jamesday 2005-02-09 21:51:33 UTC
Not good enough on the database side. Try explain select for a set of three
categories. You'll find that the explain has the magic words "using filesort".
That translates as "every record in the category will be retrieved, I'll sort
them, then I'll return the number matching the limit". That is, it scales very
badly. 

To avoid this there are several approaches you can use. First, most important,
is to arrange to have the key matching the order by. If you get records from,
say, recent changes, you can eliminate records not matching the category pretty
quickly and recent changes is usually very well cached, so it's not too painful
to scan back in tiemstamp order to find matching entries. Getting distinct hits
might be an issue - need to see what makes the limit effective.

Next, try a union with each select in the union subject to the limit and a final
overall limit. Because of the way MySQL before version 5 handles indexes this
can be substantially faster when a multipart index is used and you're using
different values of the leading key parts. 
Comment 14 Ævar Arnfjörð Bjarmason 2005-06-10 18:24:40 UTC
Comments (note that I've only read the code, not executed it):

* You're using extension input as paramaters, don't, write it for 1.5 where we
have them in the arguments (like <DynamicPageList category="foo"
count="5"></DynamicPageList>
* It'll only work with the 1.4 schema
* if there's no row returned at line 158 the output will be <ul>\n</ul> which is
invalid XHTML
Comment 15 Amgine 2005-12-02 21:35:52 UTC
Created attachment 1128 [details]
version 2.0

* Add unset cache
* break out parameter parsing, query build, output build
* Add category OR
* Add namespace OR
* expand error handling

Most new content adapated from DPL2 by w:de:Benutzer:Unendlich (Fabian)
Comment 16 Amgine 2005-12-02 21:42:40 UTC
Comment on attachment 1128 [details]
version 2.0

<grumble>
Comment 17 Amgine 2005-12-02 22:14:02 UTC
Created attachment 1129 [details]
Diff version 1.9, 2.0

Diff as per IRC suggestion
Comment 18 Brion Vibber 2005-12-03 02:01:13 UTC
Latest attachment is very awkwardly written; use an object instead of passing 
this associative array.
Comment 19 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-09-09 23:21:04 UTC
Wasn't this long ago enabled on Wikinews?
Comment 20 Chad H. 2008-07-30 03:25:42 UTC
Long-since incorporated to the features of DPL.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links