Last modified: 2014-09-24 01:22:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T4415, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 2415 - Add pagination to Special:Lonelypages
Add pagination to Special:Lonelypages
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: Low enhancement with 8 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
https://meta.wikimedia.org/w/index.ph...
: patch, patch-need-review
Depends on: 4699
Blocks:
  Show dependency treegraph
 
Reported: 2005-06-14 21:37 UTC by Mark Pellegrini
Modified: 2014-09-24 01:22 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this) (5.66 KB, patch)
2007-01-21 16:39 UTC, Riley Lynch
Details
Querycache enhancement (serial range scans for paging), version 2 (6.73 KB, patch)
2007-03-17 09:37 UTC, Riley Lynch
Details
Rewrite of previous patch using branch REL1_17 (104384) (9.15 KB, patch)
2011-11-28 05:51 UTC, Riley Lynch
Details

Description Mark Pellegrini 2005-06-14 21:37:54 UTC
[[Special:Lonelypages]] (which is linked to from the en main page to increase
the visibility of orphaned articles) cannot display more than the first 1,000
orphaned pages (by alphabetical order)
Comment 1 Antoine "hashar" Musso (WMF) 2005-07-13 17:37:44 UTC
The limit of 1000 pages is set to make it faster. Resolve
some of those lonely page and you will get the next.
Comment 2 Zigger 2005-07-14 02:04:54 UTC
(In reply to comment #1)
> The limit of 1000 pages is set to make it faster. Resolve
> some of those lonely page and you will get the next.

This is prevented by the combination of the limit, the caching, and no visual
indication of resolved orphans makes this difficult.  See bug 2599.
Comment 3 SJ 2006-01-21 02:06:46 UTC
Static dumps of these specialpage outputs once a [month? for starters] -- the entire output, even if it's 20k article titles -- would be helpful.  Likewise for ancientpages, &c.
Comment 4 Ævar Arnfjörð Bjarmason 2006-01-21 02:23:02 UTC
This bug depends on bug 4699 to be able to enable this without bogging down the
servers, adding dependancy. Changing component since this is a Wikimedia
configuration issue.
Comment 5 Riley Lynch 2007-01-21 16:39:06 UTC
Created attachment 3106 [details]
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Instead of using LIMIT to restrict the number of rows fetched from the query
cache, SELECT an indexed range. This should make operations on the query cache
trivially expensive so that the limits can be disabled in the $wgQueryPages
array. (Loading the cache without a LIMIT is only a little more expensive since
the bulk of the expense of these queries is in assembling the results rather
than returning them.)

For the purpose of this patch, I only turned off the cache limit on
SpecialLonelypages, but I recommend that the rest of the cache limits be
likewise disabled.

Tested with a build of the SimpleEnglish data.
Comment 6 Riley Lynch 2007-01-21 16:40:03 UTC
Comment on attachment 3106 [details]
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

might also be relevant to #4699
http://bugzilla.wikimedia.org/show_bug.cgi?id=4699
Comment 7 Riley Lynch 2007-01-22 03:02:55 UTC
Comment on attachment 3106 [details]
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Please note schema changes -- before deploying code:

alter table querycache add column qc_serial int(5) unsigned NOT NULL default
'0';
alter table querycache add unique key (qc_type,qc_serial);
Comment 8 Riley Lynch 2007-01-22 23:32:34 UTC
Wish I could edit previous comments! Anyway, with deeper study of MySql's LIMIT
optimizations, it seems like some queries may benefit from LIMITs more than
others, particularly ones where the results can be compiled during a range scan.
But queries like this one (and its evil twin, Deadendpages) seem to do most of
their work before the LIMIT is applied. Would it be possible to test this one
time -- disabling the LIMIT for Lonelypages and comparing run time and row count
to the LIMITed version? 


Comment 9 Riley Lynch 2007-03-17 09:32:03 UTC
Comment on attachment 3106 [details]
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Index: includes/QueryPage.php
===================================================================
--- includes/QueryPage.php	(revision 19480)
+++ includes/QueryPage.php	(working copy)
@@ -19,7 +19,7 @@
	array( 'DoubleRedirectsPage',		'DoubleRedirects'	       
),
	array( 'ListUsersPage', 		'Listusers'		       
),
	array( 'ListredirectsPage', 'Listredirects' ),
-	array( 'LonelyPagesPage',		'Lonelypages'		       
),
+	array( 'LonelyPagesPage',		'Lonelypages',		false  
),
	array( 'LongPagesPage', 		'Longpages'		       
),
	array( 'MostcategoriesPage',		'Mostcategories'	       
),
	array( 'MostimagesPage',		'Mostimages'		       
),
@@ -114,6 +114,15 @@
		return "SELECT 'sample' as type, 0 as namespace, 'Sample
result' as title, 42 as value";
	}

+	// ASSUMPTION: arguments are SQL clean
+	function getCacheSQL($querycache,$type,$limit,$offset) {
+		$first = $offset + 1;
+		$last = $offset + $limit;
+		$sql = "SELECT qc_type as type, qc_serial as serial_id,
qc_namespace as namespace, qc_title as title, qc_value as value FROM
$querycache WHERE qc_type='$type' AND qc_serial BETWEEN $first AND $last ORDER
BY qc_serial";
+		// Cache results are assumed to be sorted on insert
+		return $sql;
+	}
+
	/**
	 * Override to sort by increasing values
	 */
@@ -221,8 +230,9 @@
		if ( $res ) {
			$num = $dbr->numRows( $res );
			# Fetch results
-			$insertSql = "INSERT INTO $querycache
(qc_type,qc_namespace,qc_title,qc_value) VALUES ";
+			$insertSql = "INSERT INTO $querycache
(qc_type,qc_serial,qc_namespace,qc_title,qc_value) VALUES ";
			$first = true;
+			$serial_id = 0;
			while ( $res && $row = $dbr->fetchObject( $res ) ) {
				if ( $first ) {
					$first = false;
@@ -237,6 +247,7 @@

				$insertSql .= '(' .
					$dbw->addQuotes( $row->type ) . ',' .
+					$dbw->addQuotes( ++$serial_id ) . ',' .
					$dbw->addQuotes( $row->namespace ) .
',' .
					$dbw->addQuotes( $row->title ) . ',' .
					$dbw->addQuotes( $value ) . ')';
@@ -288,13 +299,13 @@

		if ( !$this->isCached() ) {
			$sql = $this->getSQL();
+			$sql .= $this->getOrder();
+			$sql = $dbr->limitResult($sql, $limit, $offset);
		} else {
			# Get the cached result
			$querycache = $dbr->tableName( 'querycache' );
			$type = $dbr->strencode( $sname );
-			$sql =
-				"SELECT qc_type as type, qc_namespace as
namespace,qc_title as title, qc_value as value
-				 FROM $querycache WHERE qc_type='$type'";
+			$sql =
$this->getCacheSQL($querycache,$type,$limit,$offset);

			if( !$this->listoutput ) {

@@ -324,8 +335,6 @@

		}

-		$sql .= $this->getOrder();
-		$sql = $dbr->limitResult($sql, $limit, $offset);
		$res = $dbr->query( $sql );
		$num = $dbr->numRows($res);

Index: includes/SpecialLonelypages.php
===================================================================
--- includes/SpecialLonelypages.php	(revision 19480)
+++ includes/SpecialLonelypages.php	(working copy)
@@ -35,8 +35,7 @@
		return
		  "SELECT 'Lonelypages'  AS type,
			  page_namespace AS namespace,
-			  page_title	 AS title,
-			  page_title	 AS value
+			  page_title	 AS title
		     FROM $page
		LEFT JOIN $pagelinks
		       ON page_namespace=pl_namespace AND page_title=pl_title
@@ -45,6 +44,11 @@
		      AND page_is_redirect=0";

	}
+
+	function getOrder() {
+		return ' ORDER BY title '; // no value column for this query
+	}
+
 }

 /**
Index: maintenance/archives/patch-querycache-serial.sql
===================================================================
--- maintenance/archives/patch-querycache-serial.sql	(revision 0)
+++ maintenance/archives/patch-querycache-serial.sql	(revision 0)
@@ -0,0 +1,8 @@
+-- Add a serial column to store an id for each row according to qc_type.
+-- Add an index (qc_type, qc_serial) to facilitate paged sequential reads.
+
+ALTER TABLE querycache
+	 ADD (qc_serial int(5) unsigned);
+
+ALTER TABLE querycache
+	 ADD UNIQUE KEY(qc_type, qc_serial);
Index: maintenance/mysql5/tables-binary.sql
===================================================================
--- maintenance/mysql5/tables-binary.sql	(revision 19480)
+++ maintenance/mysql5/tables-binary.sql	(working copy)
@@ -946,6 +946,9 @@
   -- A key name, generally the base name of of the special page.
   qc_type char(32) NOT NULL,

+  -- A serial id for each cache row according to type
+  qc_serial int(5) unsigned,
+  
   -- Some sort of stored value. Sizes, counts...
   qc_value int(5) unsigned NOT NULL default '0',

Index: maintenance/mysql5/tables.sql
===================================================================
--- maintenance/mysql5/tables.sql	(revision 19480)
+++ maintenance/mysql5/tables.sql	(working copy)
@@ -937,6 +937,9 @@
   -- A key name, generally the base name of of the special page.
   qc_type char(32) NOT NULL,

+  -- A serial id for each cache row according to type
+  qc_serial int(5) unsigned,
+  
   -- Some sort of stored value. Sizes, counts...
   qc_value int(5) unsigned NOT NULL default '0',

Index: maintenance/postgres/tables.sql
===================================================================
--- maintenance/postgres/tables.sql	(revision 19480)
+++ maintenance/postgres/tables.sql	(working copy)
@@ -362,10 +362,12 @@

 CREATE TABLE querycache (
   qc_type	 TEXT	   NOT NULL,
+  qc_serial	 SMALLINT,
   qc_value	 SMALLINT  NOT NULL,
   qc_namespace  SMALLINT  NOT NULL,
   qc_title	 TEXT	   NOT NULL
 );
+CREATE UNIQUE INDEX querycache_type_serial ON querycache (qc_type, qc_serial);
 CREATE INDEX querycache_type_value ON querycache (qc_type, qc_value);

 CREATE TABLE querycache_info (
Index: maintenance/tables.sql
===================================================================
--- maintenance/tables.sql	(revision 19480)
+++ maintenance/tables.sql	(working copy)
@@ -925,6 +925,9 @@
   -- A key name, generally the base name of of the special page.
   qc_type char(32) NOT NULL,

+  -- A serial id for each cache row according to type
+  qc_serial int(5) unsigned,
+  
   -- Some sort of stored value. Sizes, counts...
   qc_value int(5) unsigned NOT NULL default '0',

@@ -932,6 +935,7 @@
   qc_namespace int NOT NULL default '0',
   qc_title char(255) binary NOT NULL default '',

+  UNIQUE KEY (qc_type,qc_serial),
   KEY (qc_type,qc_value)

 ) TYPE=InnoDB;
Index: maintenance/updaters.inc
===================================================================
--- maintenance/updaters.inc	(revision 19480)
+++ maintenance/updaters.inc	(working copy)
@@ -40,6 +40,7 @@

 $wgNewFields = array(
 #	     table	      field		patch file (in
maintenance/archives)
+	array( 'querycache',	'qc_serial',	   
'patch-querycache-serial.sql' ),
	array( 'ipblocks',	'ipb_id',	    'patch-ipblocks.sql' ),
	array( 'ipblocks',	'ipb_expiry',	    'patch-ipb_expiry.sql' ),
	array( 'recentchanges', 'rc_type',	    'patch-rc_type.sql' ),
Comment 10 Riley Lynch 2007-03-17 09:37:05 UTC
Created attachment 3353 [details]
Querycache enhancement (serial range scans for paging), version 2

Update to previous patch to include updaters.inc/archives. qc_serial is now
nullable with no default in order to support updating that table without having
to empty the legacy querycache.
Comment 11 Sumana Harihareswara 2011-11-09 16:59:37 UTC
The bug is still reproducible. https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500 

Riley, I'm sorry it's taken so long for you to get a response to your patch.  I am trying to get developers to respond soon.
Comment 12 Riley Lynch 2011-11-09 19:15:05 UTC
Thanks, Sumana. I'd be glad to look into the details if the points of contact for this patch have changed since (wow!) 2007.
Comment 13 Roan Kattouw 2011-11-20 18:39:54 UTC
Marking patch as obsolete since it doesn't apply anymore, at all. The QueryPage subsystem saw major architectural changes in December 2010. For one thing, there is now no raw SQL left, which means that most of the patch will need to be rewritten.
Comment 14 Sumana Harihareswara 2011-11-26 20:49:20 UTC
Removed "patch" and "need-review" keywords.

Riley, it does look like -- as you put it -- the points of contact for the patch have changed rather a lot since 2007.  You can look around on mediawiki.org with a search for "QueryPage", visit the MediaWiki development community in IRC -- https://www.mediawiki.org/wiki/MediaWiki_on_IRC  --  or in the mailing list -- https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- to learn more about the current state of this component and talk about approach.  Thank you for your interest; hope we can help!
Comment 15 Riley Lynch 2011-11-28 05:51:52 UTC
Created attachment 9568 [details]
Rewrite of previous patch using branch REL1_17 (104384)

Rewrote the patch to conform to changes since it was originally written.
Comment 16 Riley Lynch 2011-11-28 07:05:30 UTC
With a querycache of 250,000 rows, including 50,000 lonelypages rows, a query with a large offset without the patch:

  select SQL_NO_CACHE qc_type, qc_namespace, qc_title, qc_value from querycache
  where qc_type = 'lonelypages'
  order by qc_value ASC
  limit 500 offset 35000;

  --> about 0.191 seconds

The same offset with the patch:

  select SQL_NO_CACHE qc_type, qc_namespace, qc_title, qc_value from querycache
  where qc_type = 'lonelypages'"+
  and qc_seq between 35001 and 35500
  order by qc_value ASC;

  --> about 0.004 seconds
Comment 17 Sumana Harihareswara 2011-11-28 15:49:54 UTC
Riley, you based your current patch off branch REL1_17 -- for better freshness & reviewability, you should base it off SVN trunk.
Comment 18 Sumana Harihareswara 2011-11-28 18:54:52 UTC
Riley noted to me:

"You linked Lonelypages on meta with a note that the bug is still reproducible:

https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500 

But when I click through, I don't see the problem, which is to say, I'm able to click through on the following page of 500 results, and the next, with the last result being the 1948th. Am I failing to notice something?

On the other hand, I do see the problem in the Spanish wikipedia:

http://es.wikipedia.org/w/index.php?title=Especial:P%C3%A1ginasHu%C3%A9rfanas&limit=500&offset=4500

Here you can see the that the results are limited to 5,000 which falls something short of the letter "C"."

I thought the bug was reproducible on meta when I posted that comment on the 9th, but now I cannot reproduce it there.  However, I can, like Riley, reproduce it on Spanish Wikipedia.
Comment 19 Riley Lynch 2011-11-28 19:39:55 UTC
Ok, will rewrite patch for trunk. Thanks!
Comment 20 Sumana Harihareswara 2012-05-16 19:57:01 UTC
Riley, when you rewrite the patch, please submit it directly into Git, our source control system. You can do that with developer access: https://www.mediawiki.org/wiki/Developer_access

Thanks!
Comment 21 Sumana Harihareswara 2012-10-12 01:32:44 UTC
Riley, were you able to rewrite for trunk?
Comment 22 Andre Klapper 2012-11-26 13:46:38 UTC
Riley: Did you manage to rewrite the patch? If not, how could we help?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links