Last modified: 2010-12-22 21:26:22 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 8130 - Query pages should limit to content namespaces, not just main namespace
Query pages should limit to content namespaces, not just main namespace
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Roan Kattouw
:
: 18804 (view as bug list)
Depends on: 14869
Blocks: 4204
  Show dependency treegraph
 
Reported: 2006-12-03 07:33 UTC by Rob Church
Modified: 2010-12-22 21:26 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Implementation of Namespace::isContentQuery() (8.95 KB, patch)
2009-03-11 21:50 UTC, nephele
Details
Add/implement MWNamespace::getContentNamespaces(), patch for querypage-work branch (7.84 KB, patch)
2009-04-25 19:19 UTC, nephele
Details

Description Rob Church 2006-12-03 07:33:11 UTC
Per bug 7991 comment 3, certain query pages should probably have their SQL
tweaked such that all valid content namespaces, as defined in
$wgContentNamespaces, are included, rather than just the main namespace.

This might still present problems, where pages are laid out with the assumption
that all list items reflect a single namespace, but it's probably a viable
consideration for the future.
Comment 1 nephele 2009-03-11 21:50:05 UTC
Created attachment 5916 [details]
Implementation of Namespace::isContentQuery()

Makes various special pages check all specified content namespaces instead of just NS_MAIN.
Comment 2 nephele 2009-03-11 21:53:24 UTC
This bug has been a problem on the wiki where I'm an admin (www.uesp.net), because our site makes extensive use of content namespaces other than NS_MAIN.  I've implemented a code modification on our wiki that fixes this bug, and I think it would be useful if the same fixes could be implemented in the core mediawiki code.  In my previous comment I uploaded a patch file that implements all of the suggested changes.

I've added a new static function, Namespace::isContentQuery(), to Namespace.php (basically just moving existing code in SpecialPopularPages.php into a centralized function).
That function has then been added into the SQL queries of all relevant special pages, basically just replacing "page_namespace=".NS_MAIN with "page_namespace ".Namespace::isContentQuery() .  A few special cases where more than just one line needed to be changed in the Specialxxx.php are:
* On SpecialPopularPages.php and SpecialShortpages.php existing redundant code was removed.
* On SpecialUncategorizedpages.php, the default request was changed to be "all content pages" (requested using NULL) instead of NS_MAIN (Uncategorizedtemplates and Uncategorizedcategories still work properly)

These changes have been in place at UESP for a year now, so I'm confident that they do work on a live wiki.  If anyone would like to see the resulting code in action, you can check some of UESP's special pages, i.e.
http://www.uesp.net/wiki/Special:Disambiguations
http://www.uesp.net/wiki/Special:Lonelypages
http://www.uesp.net/wiki/Special:Deadendpages
http://www.uesp.net/wiki/Special:DoubleRedirects
The only caveat is that UESP is still running Mediawiki 1.10.0 (in part because we have numerous code modifications like this that make upgrading unbearably difficult, thus the interest in getting some of the modifications added to mediawiki).  However, the relevant code hasn't changed much since 1.10.  The attached patch was created for 1.14, and I did some basic tests on the 1.14 configuration to check that the changes work.
Comment 3 Roan Kattouw 2009-03-12 08:33:41 UTC
Argh, ugliness!

All this code should use the Database class functions instead of building a raw SQL query. When they do that, you no longer need the ugly-as-hell MWNamespace::isContentQuery() (the Namespace class was renamed to MWNamespace a while ago) that spits out weird raw SQL fragments, but just a function that returns an array of content namespaces.
Comment 4 nephele 2009-04-24 19:31:34 UTC
Yes, the code may be ugly.  But the patch is not the source of the ugliness -- the existing SpecialPage classes all pass around raw SQL fragments and make minimal use of the Database class functions.  In a nutshell, trying to refactor the code to make it pretty would be a major code revamp -- with no performance or functionality benefits -- and refactoring the code would by itself still do nothing to fix this bug.  I'm not trying to provide the ultimate, perfect code; I'm just trying to provide some progress towards fixing a two-year old bug.

Some of the fundamental obstacles complicating a conversion of all this code to use the Database class functions include:
* Some 32 special pages are based on QueryPage.  Any comprehensive fix requires revamping all of those pages, plus QueryPage.php and PageQueryPage.php.  Even worse, a comprehensive fix is likely to break any extension special pages that create special pages derived from QueryPage -- possibly 235 extensions based on Mediawiki-registered extensions alone.
* QueryPage::getSQL is supposed to return raw SQL.  If getSQL is changed to instead return an array of ($table, $var, $cond, $fname, $options), then every use of getSQL is made uglier, because it's now passing around 5+ parameters instead of just one.  The other option is that getSQL still returns raw SQL, but that the various implementations of getSQL use Database functions to generate that raw SQL -- however, that's not currently possible, because the Database class does not have functions that construct a raw SQL query and return it -- without actually executing the SQL query.
* Many of the existing SQL queries cannot be handled by the Database functions.  See SpecialDisambiguations, for example, where the page table is accessed twice within a single query, and therefore it is aliased ("FROM {$page} AS pb, {$pagelinks} AS la, {$page} AS pa").  The database functions are unable to handle table aliases.  Without major revisions to Database, it is impossible to change all of the special pages to use the database functions -- and changing half of the special pages to use one syntax and leaving half using the old syntax is even uglier, in my opinion.

After all of that is done, it's still necessary to figure out how to fix this bug.
* The "prettiest" way to do it would seem to be adding it as a flag recognized by Database::makeSelectOptions.  However, burying it that deep within the Database class then means it's unavailable to queries that are not entirely constructed by the Database functions -- such as SpecialDisambiguations (unless the database functions are made fully alias-aware).  So then a separate function is needed outside of makeSelectOptions that spits out raw SQL fragments -- so why not just introduce that separate function right now instead of forcing a code revamp that won't even get rid of the separate function?
* I added isContentQuery to the MWNamespace class because the necessary information is ''not'' part of the database: the list of content namespaces is defined solely through a global variable, $wgContentNamespaces.  With my patch, all uses of $wgContentNamespaces are centralized in Namespace.php -- only MWNamespace needs to know the inner details of the content namespaces are defined.

I'm willing to update my patch if provided with specific suggestions about what could be done differently within the scope of this approach.  But I'm not prepared to take on a huge project to fix something that's basically unrelated to this bug.  Especially since with my superficial knowledge of the MW codebase, my efforts to tackle such a huge revamp would inevitably add a whole new stack of bugs.

Comment 5 Roan Kattouw 2009-04-24 19:34:06 UTC
(In reply to comment #4)
> Yes, the code may be ugly.  But the patch is not the source of the ugliness --
> the existing SpecialPage classes all pass around raw SQL fragments and make
> minimal use of the Database class functions.  In a nutshell, trying to refactor
> the code to make it pretty would be a major code revamp -- with no performance
> or functionality benefits -- and refactoring the code would by itself still do
> nothing to fix this bug.  I'm not trying to provide the ultimate, perfect code;
> I'm just trying to provide some progress towards fixing a two-year old bug.
> 
> Some of the fundamental obstacles complicating a conversion of all this code to
> use the Database class functions include:
> * Some 32 special pages are based on QueryPage.  Any comprehensive fix requires
> revamping all of those pages, plus QueryPage.php and PageQueryPage.php.  Even
> worse, a comprehensive fix is likely to break any extension special pages that
> create special pages derived from QueryPage -- possibly 235 extensions based on
> Mediawiki-registered extensions alone.
> * QueryPage::getSQL is supposed to return raw SQL.  If getSQL is changed to
> instead return an array of ($table, $var, $cond, $fname, $options), then every
> use of getSQL is made uglier, because it's now passing around 5+ parameters
> instead of just one.  The other option is that getSQL still returns raw SQL,
> but that the various implementations of getSQL use Database functions to
> generate that raw SQL -- however, that's not currently possible, because the
> Database class does not have functions that construct a raw SQL query and
> return it -- without actually executing the SQL query.
> * Many of the existing SQL queries cannot be handled by the Database functions.
>  See SpecialDisambiguations, for example, where the page table is accessed
> twice within a single query, and therefore it is aliased ("FROM {$page} AS pb,
> {$pagelinks} AS la, {$page} AS pa").  The database functions are unable to
> handle table aliases.  Without major revisions to Database, it is impossible to
> change all of the special pages to use the database functions -- and changing
> half of the special pages to use one syntax and leaving half using the old
> syntax is even uglier, in my opinion.
> 
> After all of that is done, it's still necessary to figure out how to fix this
> bug.
> * The "prettiest" way to do it would seem to be adding it as a flag recognized
> by Database::makeSelectOptions.  However, burying it that deep within the
> Database class then means it's unavailable to queries that are not entirely
> constructed by the Database functions -- such as SpecialDisambiguations (unless
> the database functions are made fully alias-aware).  So then a separate
> function is needed outside of makeSelectOptions that spits out raw SQL
> fragments -- so why not just introduce that separate function right now instead
> of forcing a code revamp that won't even get rid of the separate function?
> * I added isContentQuery to the MWNamespace class because the necessary
> information is ''not'' part of the database: the list of content namespaces is
> defined solely through a global variable, $wgContentNamespaces.  With my patch,
> all uses of $wgContentNamespaces are centralized in Namespace.php -- only
> MWNamespace needs to know the inner details of the content namespaces are
> defined.
> 
> I'm willing to update my patch if provided with specific suggestions about what
> could be done differently within the scope of this approach.  But I'm not
> prepared to take on a huge project to fix something that's basically unrelated
> to this bug.  Especially since with my superficial knowledge of the MW
> codebase, my efforts to tackle such a huge revamp would inevitably add a whole
> new stack of bugs.
> 

I'm working on doing all this in the querypage-work branch; and yes, it is possible to prettify stuff and move away from raw SQL.

Comment 6 nephele 2009-04-24 23:42:10 UTC
Aha, that makes it alot easier (especially for a newb like me) to see the direction where you're heading with the special page queries.

So, should I wait until the querypage-work branch is complete to proceed?

Or could I help by providing a patch for querypage-work? In which case, it looks like what you're suggesting is that the entry in $conds would be changed from
  page_namespace => NS_MAIN
to something like
  page_namespace => MWNamespace::getContentNamespaces()
where getContentNamespaces could return a single value or an array of values (returning a single-value as a non-array seems slightly more efficient for Database::makeList).

Comment 7 Roan Kattouw 2009-04-25 06:57:34 UTC
(In reply to comment #6)
> Aha, that makes it alot easier (especially for a newb like me) to see the
> direction where you're heading with the special page queries.
> 
> So, should I wait until the querypage-work branch is complete to proceed?
> 
> Or could I help by providing a patch for querypage-work? In which case, it
> looks like what you're suggesting is that the entry in $conds would be changed
> from
>   page_namespace => NS_MAIN
> to something like
>   page_namespace => MWNamespace::getContentNamespaces()
Yeah, that's what will be needed. I'll look into that on a per-query basis (such a condition might hurt efficiency for some queries).

> where getContentNamespaces could return a single value or an array of values
> (returning a single-value as a non-array seems slightly more efficient for
> Database::makeList).
> 
Doesn't really matter.
Comment 8 nephele 2009-04-25 19:19:35 UTC
Created attachment 6062 [details]
Add/implement MWNamespace::getContentNamespaces(), patch for querypage-work branch

Here's a new patch, written specifically for the querypage-work branch.  The new function is now MWNamespace::getContentNamespaces().  I went ahead and added it to every query where I think it should be used.  In the process, I took a stab at converting SpecialPopularPages over to the new syntax.

In terms of efficiency, using getContentNamespaces() instead of just NS_MAIN shouldn't have any impact on wikipedia, because the two end up producing identical SQL.  And as someone who uses a wiki where the content namespaces have been customized, I'd argue getContentNamespaces() needs to be used in all possible queries -- because otherwise the corresponding special page ends up being completely useless.  If performance really is a limiting factors in some of these queries, it might make more sense to just drop the namespace condition completely (or at least drop it if the content namespaces have been customized).

There are a couple of other minor fixes included in the patch that I came across while doing a quick test-run of the code; I left them in place because it seemed likely you'd want to make those fixes eventually.  (Specifically, abstract-ing PageQueryPage, fixing a missing ' in SpecialDeadendpages)
Comment 9 Roan Kattouw 2009-04-27 13:32:00 UTC
(In reply to comment #8)
> Created an attachment (id=6062) [details]
> Add/implement MWNamespace::getContentNamespaces(), patch for querypage-work
> branch
> 
> Here's a new patch, written specifically for the querypage-work branch.
Thanks, committed in r49951.

> The
> new function is now MWNamespace::getContentNamespaces().  I went ahead and
> added it to every query where I think it should be used.  In the process, I
> took a stab at converting SpecialPopularPages over to the new syntax.
> 
Thanks; I thought I had converted all of them, but I seem to have missed that one somehow.

> In terms of efficiency, using getContentNamespaces() instead of just NS_MAIN
> shouldn't have any impact on wikipedia, because the two end up producing
> identical SQL.  And as someone who uses a wiki where the content namespaces
> have been customized, I'd argue getContentNamespaces() needs to be used in all
> possible queries -- because otherwise the corresponding special page ends up
> being completely useless.  If performance really is a limiting factors in some
> of these queries, it might make more sense to just drop the namespace condition
> completely (or at least drop it if the content namespaces have been
> customized).
> 
True. In all of the queries you modified, it looks like the WHERE clause for page_namespace is not indexed anyway, so whether it limits to one or more values doesn't matter.

> There are a couple of other minor fixes included in the patch that I came
> across while doing a quick test-run of the code; I left them in place because
> it seemed likely you'd want to make those fixes eventually.  (Specifically,
> abstract-ing PageQueryPage, fixing a missing ' in SpecialDeadendpages)
> 
Good catch.
Comment 10 Roan Kattouw 2009-05-15 11:06:57 UTC
*** Bug 18804 has been marked as a duplicate of this bug. ***
Comment 11 Roan Kattouw 2010-12-22 21:26:22 UTC
Branch was merged in r78786.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links