Last modified: 2008-07-30 03:25:42 UTC
An extension to sort articles for the most recent n articles which are in category x [AND category y [AND category z]]. The extension is vital to automate the main page of Wikinews, to allow contributors to concentrate on producing articles rather than maintaining a very rapidly aging news list. It can allow the most current versions of articles to appear while they are news. Additional applications exist for Wikipedias. This version has been tested on small-scale installations of mediawiki, 1.4b3 and 1.4b5. Source: <pre> <?php /* Contributors: n:User:Amgine, n:User:IlyaHaykinson To install: add following to LocalSettings.php include("extensions/dynamicpagelist.php"); */ $wgExtensionFunctions[] = "wfDynamicPageList"; function wfDynamicPageList() { global $wgParser; $wgParser->setHook( "DynamicPageList", "DynamicPageList" ); } # The callback function for converting the input text to HTML output function DynamicPageList( $input ) { global $wgScriptPath, $wgServer, $wgUser; $aParams = array(); $sTok = strtok($input, "\n"); while ($sTok) { $aParams[] = $sTok; $sTok = strtok("\n"); } foreach($aParams as $sParam) { $aParam = explode("=", $sParam); $sType = $aParam[0]; $sArg = $aParam[1]; if ($sType == 'category') { $sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg); $sCatName = str_replace('\'','\\\'',$sCatName); $aCategories[] = $sCatName; } else if ('count' == $sType) { $iCount = (1 * $sArg); } } $iCatCount = count($aCategories); if ($iCatCount < 1) return "!!too few categories!!"; if ($iCatCount > 3) return "!!too many categories!!"; if ($iCount < 1) $iCount = 1; if ($iCount > 50) $iCount = 50; $sSql = 'SELECT cur_namespace, cur_title FROM cur'; for ($i = 0; $i < $iCatCount; $i++) { $sSql .= ', categorylinks AS c' . ($i+1); } $sSql .= ' WHERE 1=1 '; for ($i = 0; $i < $iCatCount; $i++) { if ($i > 0) $sSql .= ' AND c1.cl_from = c'.($i+1).'.cl_from'; $sSql .= ' AND c'.($i+1).'.cl_to = \''.$aCategories[$i].'\''; } $sSql .= ' AND cur_id = c1.cl_from ORDER BY cur_timestamp DESC LIMIT 0,' . $iCount; //$output .= $sSql . "<br />"; # process the query $res = wfQuery($sSql, DB_READ); $sk =& $wgUser->getSkin(); $output .= "<ul>\n"; # process results of query while ($row = wfFetchObject( $res ) ) { $title = Title::makeTitle( $row->cur_namespace, $row->cur_title); $output .= '<li>' . $sk->makeLinkObj($title) . '</li>' . "\n"; } $output .= "</ul>\n"; return $output; } ?> </pre>
Created attachment 230 [details] The extension, updated 1-jan-05. The extension has been tested on Mediawiki 1.4b3 and 1.4b5
I think this line $sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg); would break in UTF8 wikis. PHP considers e.g. 0xA0 as whitespace, but 0xA0 is also the second byte of the cyrillic character P. Do you really need "any whitespace" here, or just "blank"?
(In reply to comment #2) > I think this line > > $sCatName = preg_replace('/[\\\\_\\s]/S','_',$sArg); > > would break in UTF8 wikis. PHP considers e.g. 0xA0 as whitespace, but 0xA0 is > also the second byte > of the cyrillic character P. Do you really need "any whitespace" here, or just > "blank"? Good point. Both preg_replace lines are now instead: $title = Title::makeTitle('',$sArg); $sCatName = wfStrencode($title->getDbKey(), DB_READ); updated version at http://www.ilya.us/wiki/index.php?title=DynamicPageList_extension
Some more remarks: if ($sType == 'category') ... else if ('count' == $sType) Could you use MagicWord's for these? Some languages prefer to use localized versions of those tags, e.g. when categories aren't named "Category" (Perhaps we should extend the extension framework to allow localized versions of tags, too??) return "!!too few categories!!"; Use wfMsg for any output strings, so that the messages can be translated. if ($iCatCount > 3) I don't like that hardcoded limit. I'm not sure we need a limit, will benchmark this tonight. If we need it, it should be an option. PS: I changed the severity to "enhancement", since this is a new feature.
(In reply to comment #4) > Some more remarks: > > if ($sType == 'category') > ... > else if ('count' == $sType) > > Could you use MagicWord's for these? Some languages prefer to use localized > versions of those tags, e.g. when categories aren't named "Category" (Perhaps > we should extend the extension framework to allow localized versions of tags, too??) I disagree for three reasons. 1. The current magic word architecture isn't meant for extensions as much as pages and the Parser. 2. "category" and "count" are internal commands for the extension only. the namespace for category can still be localize in the wiki (i.e. category=blah will map to the category named blah no matter what the namespace is) 3. Other extensions (i.e. easytimeline, which is deployed on Wikipedia) also do not localize their internal commands. > return "!!too few categories!!"; > > Use wfMsg for any output strings, so that the messages can be translated. > Done. > if ($iCatCount > 3) > > I don't like that hardcoded limit. I'm not sure we need a limit, will benchmark > this tonight. If we need it, it should be an option. Done. Now uses parameters near the top of the function. I still think it's better to have a limit, otherwise we run the risk of DOS via queries with immense amounts of joins. I've updated the code on the web site. I urge those with proper powers to please deploy this asap if there are no more objections.
Comment on attachment 230 [details] The extension, updated 1-jan-05. <?php /* Purpose: outputs a bulleted list of most recent items residing in a category, or a union of several categories. Contributors: n:User:Amgine, n:User:IlyaHaykinson To install: add following to LocalSettings.php include("extensions/dynamicpagelist.php"); */ $wgExtensionFunctions[] = "wfDynamicPageList"; function wfDynamicPageList() { global $wgParser, $wgMessageCache; $wgMessageCache->addMessages( array( 'dynamicpagelist_toomanycats' => 'DynamicPageList: Too many categories!', 'dynamicpagelist_toofewcats' => 'DynamicPageList: Too few categories!' ) ); $wgParser->setHook( "DynamicPageList", "DynamicPageList" ); } // The callback function for converting the input text to HTML output function DynamicPageList( $input ) { global $wgScriptPath, $wgServer, $wgUser; // parameters //minimum and maximum number of category unions $iMinCategories = 1; $iMaxCategories = 3; //minimum and maximum number of results allowed. $iMinResultCount = 1; $iMaxResultCount = 50; //whether unlimited results are allowed (when count is ommitted) $bAllowUnlimitedResults = true; // end params $aParams = array(); $bCountSet = false; $sTok = strtok($input, "\n"); while ($sTok) { $aParams[] = $sTok; $sTok = strtok("\n"); } foreach($aParams as $sParam) { $aParam = explode("=", $sParam); $sType = $aParam[0]; $sArg = $aParam[1]; if ($sType == 'category') { $title = Title::makeTitle('',$sArg); $sCatName = wfStrencode($title->getDbKey(), DB_READ); $aCategories[] = $sCatName; } else if ('count' == $sType) { //ensure that $iCount is a number; $iCount = (1 * $sArg); $bCountSet = true; } } $iCatCount = count($aCategories); if ($iCatCount < $iMinCategories) return wfMsg( 'dynamicpagelist_toofewcats' ); // "!!too few categories!!"; if ($iCatCount > $iMaxCategories) return wfMsg( 'dynamicpagelist_toomanycats' ); // "!!too many categories!!"; if (true == $bCountSet) { if ($iCount < $iMinResultCount) $iCount = $iMinResultCount; if ($iCount > $iMaxResultCount) $iCount = $iMaxResultCount; } else { if (false == $bAllowUnlimitedResults) { $iCount = $iMaxResultCount; $bCountSet = true; } } //build the SQL query $sSql = 'SELECT cur_namespace, cur_title FROM cur'; for ($i = 0; $i < $iCatCount; $i++) { $sSql .= ', categorylinks AS c' . ($i+1); } $sSql .= ' WHERE 1=1 '; for ($i = 0; $i < $iCatCount; $i++) { if ($i > 0) $sSql .= ' AND c1.cl_from = c'.($i+1).'.cl_from'; $sSql .= ' AND c'.($i+1).'.cl_to = \''.$aCategories[$i].'\''; } $sSql .= ' AND cur_id = c1.cl_from ORDER BY cur_timestamp DESC'; if (true == $bCountSet) { $sSql .= ' LIMIT 0,' . $iCount; } //DEBUG: output SQL query //$output .= $sSql . "<br />"; // process the query $res = wfQuery($sSql, DB_READ); $sk =& $wgUser->getSkin(); //start unordered list $output .= "<ul>\n"; //process results of query, outputing equivalent of <li>[[Article]]</li> for each result while ($row = wfFetchObject( $res ) ) { $title = Title::makeTitle( $row->cur_namespace, $row->cur_title); $output .= '<li>' . $sk->makeLinkObj($title) . '</li>' . "\n"; } //end unordered list $output .= "</ul>\n"; return $output; } ?>
Please don't paste large amounts of code into comments. It's hard to read, clutters up the page, and doesn't get formatted correctly. Use the 'Create attachment' link.
Created attachment 259 [details] Cleaned up, simplified, renamed to avoid conflict This is a rewrite to simplify and speed up the previous version. The extension is renamed as a kluge so I could compare it with the previous on my test installation. This version includes: configurable maximum category searches (may be configured for unlimited, default 3) configurable maximum return articles (may be configured to allow unlimited, default 5)
Created attachment 267 [details] Updated with Amgine's optimizations merged in Incorporated Amgine's optimizations. Kept greater configurability (min/max categories, min/max results, and unlimited categories and results are all parameters).
Created attachment 275 [details] Updated with some fixes I've made a few fixes: * If a line doesn't contain a "=", avoid triggering a notice error for unset variables in $aParam[1] * Title::makeTitle is not safe for user-provided data as it does no sanitizing; use Title::newFromText. (This also will allow people to write 'Category:Foo' and get the expected thing.) * If invalid, $title will be null and the getDbKey() call can fail, so skip the line. * Avoid encoding the dbkey so early, it makes the code harder to follow. * Use IntVal instead of 1 * $sArg. (Note that multiplication will allow float values; this may be harmless here but is not really what we meant.) * Fixed some tabs -- when using spaces for indentation, try to avoid mixing tabs as the tab size may not be the same for all editors and it uglifies the code. * Extensions return raw HTML; avoid returning a raw message as this allows careless or rogue sysops or hijacked sysop accounts to break the wiki (invalid HTML) or create security risks (JavaScript exploits etc). Use htmlspecialchars() to force plaintext, or run the message through the wiki parser. (Used htmlspecialchars for now.) * Use booleans as booleans rather than false == and true ==, for readability and consistency. * Switched database calls to the new OO functions. * In MediaWiki 1.4 tables may have a configurable prefix; get the canonical name with $dbr->tableName(). * For readability, move the 'cur_id=cl.cl_from' clause to the top and eliminate the 1=1 clause. * LIMIT N,N is a MySQL-ism. Since no offset is needed, just use LIMIT N for portability (use $dbr->limitResults() if needed). * If there are no results, the <ul></ul> produced doesn't validate (a list must contain at least one item). Return an error message instead. * Initialize $output before using it, or a notice is thrown when error reporting is put up high. * Use $sk->makeKnownLinkObj() instead of $sk->makeLinkObj(). We know the pages exist, so we can avoid hitting the database again to check. You might also consider making the configuration options settable from LocalSettings.php, so the extension code doesn't have to be altered.
Created attachment 277 [details] Moved variables to LocalSettings.php Moves configuration variables to LocalSettings.php
Created attachment 278 [details] Correct version - with the changes implemented and not just doodling... Correctly implemented the parameter variables.
Not good enough on the database side. Try explain select for a set of three categories. You'll find that the explain has the magic words "using filesort". That translates as "every record in the category will be retrieved, I'll sort them, then I'll return the number matching the limit". That is, it scales very badly. To avoid this there are several approaches you can use. First, most important, is to arrange to have the key matching the order by. If you get records from, say, recent changes, you can eliminate records not matching the category pretty quickly and recent changes is usually very well cached, so it's not too painful to scan back in tiemstamp order to find matching entries. Getting distinct hits might be an issue - need to see what makes the limit effective. Next, try a union with each select in the union subject to the limit and a final overall limit. Because of the way MySQL before version 5 handles indexes this can be substantially faster when a multipart index is used and you're using different values of the leading key parts.
Comments (note that I've only read the code, not executed it): * You're using extension input as paramaters, don't, write it for 1.5 where we have them in the arguments (like <DynamicPageList category="foo" count="5"></DynamicPageList> * It'll only work with the 1.4 schema * if there's no row returned at line 158 the output will be <ul>\n</ul> which is invalid XHTML
Created attachment 1128 [details] version 2.0 * Add unset cache * break out parameter parsing, query build, output build * Add category OR * Add namespace OR * expand error handling Most new content adapated from DPL2 by w:de:Benutzer:Unendlich (Fabian)
Comment on attachment 1128 [details] version 2.0 <grumble>
Created attachment 1129 [details] Diff version 1.9, 2.0 Diff as per IRC suggestion
Latest attachment is very awkwardly written; use an object instead of passing this associative array.
Wasn't this long ago enabled on Wikinews?
Long-since incorporated to the features of DPL.