Last modified: 2011-03-13 18:06:08 UTC
Shows 1000 entries: http://en.wikipedia.org/w/index.php?title=Special:Uncategorizedpages&limit=1000&offset=0 Shows 0 entries: http://en.wikipedia.org/w/index.php?title=Special:Uncategorizedpages&limit=1000&offset=1000 Same on wikibooks.
See also bug <a href="http://bugzilla.wikipedia.org/show_bug.cgi?id=2415">2415</a>, re: lonelypages. The same is also true for ancientpages... static dumps of the full lists, along with a quick count of the # of matches for those queries, would be very helpful; the latter on the Special:Statistics page.
On the lonelypages bug, Ashar says that the 1000 page limit is "set to make it faster" which doesn't make any sense. How is retrieving records 1-1000 somehow faster than 1001-2000 out of a single table? And even if this is faster, I am willing to wait around for a special page if that helps me get some actual work done. Also, for wikibooks it's not as simple as for WP. On wikibooks you're working on a specific subject area/book and you should not be categorizing other books' pages without knowledge of their conventions, etc. So, if I want to find the _cookbook_ pages that are uncategorized, I just can't, because they are further down than 1000.
Is anybody going to take this one on or at least comment on the bug? This seems like something that would be easy to fix.
The issue is about raising the default limit on cached special page queries to increase the size of the set. While it's trivial to tweak, the question is - do we want to, and how much of a performance hit (we've got to run the queries periodically, remember, and they take time) are we looking at? So someone involved in Wikimedia server administration has to make a decision.
For wikibooks, could we just turn off the limiting completely? The special pages aren't heavily used and we only have ~15,000 modules. Alternatively, could we turn off caching? With the relatively small number of modules and categories I doubt it would be a huge performance hit. Right now Uncategorizedpages is basically useless for wikibooks.
*** Bug 8450 has been marked as a duplicate of this bug. ***
This is somewhat important for images also, since the toolserver is down we have no way of knowing which images are completely untagged. The uncategorized can act as a proxy for this usually. We haven't really been keeping up with this, but the number of them is probably pretty high now, seeing as 1000 barely gets through the b's. For this application though we wouldn't need it biweekly, bimonthly or even monthly would be fine.
I submitted a patch for Bug 2415 which could also fix this problem if the LIMIT is turned off for this query. The LIMIT may not substantially reduce the cost of the cache-building queries since it is only applied after the "heavy lifting" (full table scans, joins, etc). This can be confirmed by comparing the stats after running the queries with and without the LIMITs. (Note that the cache-building query LIMIT is indirectly making reads on the querycache less expensive -- because it is keeping it small.) (See also Bug 4699 for a discussion of the problems of using LIMIT.)
No, the size can't be increased. It's not only the query that's expensive, but the insert is expensive, too. Use the toolserver for such requests.