Last modified: 2013-01-08 19:13:19 UTC
There are a number of bugs in which small wikis are unfairly impacted by the performance constraints of large wikis. For example, many Special pages have been disabled across all Wikimedia wikis (cf. bug 15434). A small wiki such as ch.wikipedia.org, with 151 content pages, is treated the same as a wiki with over four million content pages. This doesn't make any sense. This situation is unacceptable. A small wiki should not see a reduced user experience because of the existence of (almost entirely unrelated) wikis that have millions of content pages. We know the approximate sizes involved, so we should be able to safely and sanely tier these wikis (and then periodically check those tiers for accuracy and appropriateness). While we all wish that every wiki could be treated equally, it doesn't make any sense to punish small wikis indefinitely due to circumstances over which they have no control or involvement (i.e., an explosion in growth on a sibling project). Some stats are available at <https://wiki.toolserver.org/view/Wiki_server_assignments>. There are other lists at Meta-Wiki, I believe. And I can query the *links tables for size if that's deemed necessary. As far I as understand this, step one would be to make a set of groupings and then create individual wiki lists. Or perhaps just have a small.dblist or a large.dblist and add conditional statements based on that? It looks like a small.dblist may already exist, even? Is that a list of small wikis (<https://noc.wikimedia.org/conf/small.dblist> doesn't load for me)?
This looks useful: http://meta.wikimedia.org/wiki/List_of_Wikimedia_projects_by_size Where should the line be between a large and a small wiki?
(In reply to comment #1) > Where should the line be between a large and a small wiki? Any number is going to be arbitrary. Maybe the actual first step is to write a maintenance script that can evaluate the size of the wikis in the cluster and then output a file based on their sizes (with a --size flag or something). So it'd be something like "php measureWikis.php --size=10000 > large.dblist" or something? Measuring the number of content pages is probably easiest, as it's a stored value (in site_stats) and it gives a decent comparison between wikis (or it should in theory, at least).
(In reply to comment #1) > This looks useful: > http://meta.wikimedia.org/wiki/List_of_Wikimedia_projects_by_size > > Where should the line be between a large and a small wiki? That Meta page is auto-generated based on Special:Statistics, which in turn is just queryable from the sitestats database table. So (not to be nitpicky), just to be clear if and when we're going to use a server-side script to create dblists[1] groups by pagecount; it can simply use the db directly, no need to use that wiki page. [1] https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=tree
btw, for technical aspects we should probably use total page count as opposed to article count. That way file pages / categories / users are also taken into account. Because as far as the database is concerned page and revisions are all the same, whether they are articles or not. Fortunately both total page count and article count are tracked in site_stats.
Marking this as easy. Writing a maintenance script to query the cluster and output the dblist(s) should be trivial.
# Disable all the query pages that take more than about 15 minutes to update # wgDisableQueryPageUpdate @{ 'wgDisableQueryPageUpdate' => array( 'enwiki' => array( 'Ancientpages', // 'CrossNamespaceLinks', # disabled by hashar - bug 16878 'Deadendpages', 'Lonelypages', 'Mostcategories', 'Mostlinked', 'Mostlinkedcategories', 'Mostlinkedtemplates', 'Mostrevisions', 'Fewestrevisions', 'Uncategorizedcategories', 'Wantedtemplates', 'Wantedpages', ), 'default' => array( 'Ancientpages', 'Deadendpages', 'Mostlinked', 'Mostrevisions', 'Wantedpages', 'Fewestrevisions', // 'CrossNamespaceLinks', # disabled by hashar - bug 16878 ), ), # @} end of wgDisableQueryPageUpdate Source: <http://noc.wikimedia.org/conf/InitialiseSettings.php.txt>. Just pasting this here so I don't lose it.
(In reply to comment #5) > Marking this as easy. Writing a maintenance script to query the cluster and > output the dblist(s) should be trivial. I've actually just restored small.dblist from the history books. It's VERY out of date https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=blob;f=small.dblist;h=5b0a78abf7fe1018576518382cae7a4f5342e422;hb=HEAD
(In reply to comment #7) > (In reply to comment #5) >> Marking this as easy. Writing a maintenance script to query the cluster and >> output the dblist(s) should be trivial. > > I've actually just restored small.dblist from the history books. I'm not sure what value that provides other than nostalgia. It's a very out of date list that needs a maintenance script of some kind to be able to re-generate (update) it. If you want to use "small.dblist" for the name of small databases list for nostalgia's sake (and continuity's sake as well, I suppose), that's fine, I guess. But we're really nowhere closer to resolving this bug.
Created attachment 11366 [details] Sizes!
(In reply to comment #9) > Created attachment 11366 [details] > Sizes! That's using the value of select ss_good_articles from site_stats
Basic script (work in progress!) to dump all the wikis sorted by ss_good_articles in https://gerrit.wikimedia.org/r/#/c/33694
Created attachment 11379 [details] ss_total_pages
Updated https://gerrit.wikimedia.org/r/#/c/33694 moar and added the dblists to noc conf etc
(In reply to comment #13) > Updated https://gerrit.wikimedia.org/r/#/c/33694 moar and added the dblists > to noc conf etc This change has now been merged. I wonder what more is needed to resolve this bug.
(In reply to comment #13 by Reedy) > Updated https://gerrit.wikimedia.org/r/#/c/33694 moar and added the dblists > to noc conf etc Reedy: Any idea what else is needed to resolve this request completely?
Personally (let Max chime in), I would've thought that this was enough. We've now got a script to make size related dblist (parameters might want changing at a later date, but that's trivial). Those dblists have been created and are exposed via noc. The next task is to potentially do something for bug 15434 using those new lists.
Marking this bug resolved/fixed now that bug 43668 ("Re-enable disabled Special pages on small wikis (wikis in small.dblist)") exists. Thanks again, Reedy!