Last modified: 2011-04-25 19:31:16 UTC
site_stats.ss_good_articles and site_stats.ss_total_pages are not synchronized with the corresponding count done by query. site_stats.ss_total_pages != count all from page; site_stats.ss_good_articles != count ns 0 & nonredir & not dead end from page Checked on random wikis. In all tested cases site_stats results shown more than corresponding query.
Add on: Tested some other wikis and got the stats lower than query. Therefore there's no presumable behavior of it.
Checked on which random wikis in what manner? Exact SQL used would be helpful to check we've got inconsistent data, rather than invalid assumptions about what the statistics represent. You've taken issues like possible replication lag into consideration, where applicable?
Tested on toolserver couple minutes ago. Replags on s2 and s3 were within 0-4 sec during performing these queries. query #1> SELECT ss_total_pages, ss_good_articles FROM site_stats; query #2> SELECT COUNT(*) AS totalpages FROM page; query #3> SELECT COUNT(DISTINCT page_id) AS goodarticles FROM page LEFT JOIN pagelinks ON page_id = pl_from WHERE pl_from IS NOT NULL AND page_namespace = 0 AND page_is_redirect = 0; Query #3 based on the rule "good article = page in ns 0 AND not redirect AND not dead end"; query for dead end taken from http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/SpecialDeadendpages.php?view=markup and slightly modified, gives the same result as query #2 - exact query from svn) cswiki +----------------+------------------+ | ss_total_pages | ss_good_articles | +----------------+------------------+ | 187044 | 73934 | +----------------+------------------+ +------------+ | totalpages | +------------+ | 186931 | +------------+ +--------------+ | goodarticles | +--------------+ | 74123 | +--------------+ cswikisource +----------------+------------------+ | ss_total_pages | ss_good_articles | +----------------+------------------+ | 4049 | 2523 | +----------------+------------------+ +------------+ | totalpages | +------------+ | 4049 | +------------+ +--------------+ | goodarticles | +--------------+ | 2464 | +--------------+ skwiki +----------------+------------------+ | ss_total_pages | ss_good_articles | +----------------+------------------+ | 143111 | 73727 | +----------------+------------------+ +------------+ | totalpages | +------------+ | 143779 | +------------+ +--------------+ | goodarticles | +--------------+ | 73658 | +--------------+
Couple suggestions I've got on #mediawiki during today: Tim was suggesting invalid titles Duesentrieb was suggesting last time he saw sources the stats were checking only for "[[" string presence I can't remember on which wiki it was, but I also got the result site_stats.ss_good_articles > COUNT(*) WHERE page_namespace = 0 AND page_is_redirect = 0 couple times.
Live counts check for namespace, non-redirect, and '[['. Re-initialized count checks only for namespace, non-redirect, and (I *think*) non-empty. It's not efficient to check for '[[' in text in a bulk query since text has to be loaded and decompressed separately. Counts are re-initialized automatically more frequently now due to the checks for rolled-over or otherwise broken counts. Counting pagelinks entries wouldn't necessarily give the same count as '[[' checks (interwikis, images, categories, or just plain invalid links).
the same for ruwiki
Achtung! Mashiah Davidson is homophob!
Bug 15746 related.
(In reply to comment #5) > Live counts check for namespace, non-redirect, and '[['. > > Re-initialized count checks only for namespace, non-redirect, and (I *think*) > non-empty. It's not efficient to check for '[[' in text in a bulk query since > text has to be loaded and decompressed separately. > > Counts are re-initialized automatically more frequently now due to the checks > for rolled-over or otherwise broken counts. > > Counting pagelinks entries wouldn't necessarily give the same count as '[[' > checks (interwikis, images, categories, or just plain invalid links). > So is this bug still an issue?
*** Bug 15746 has been marked as a duplicate of this bug. ***
Given that many wiktionaries are putting a <!-- [[ --> into their page source for pages that only contain templates (which thus have links but no [[) to make them count, yes it is most definitely an issue. (The english wiktionary doesn't do this, instead it insists on manual links as template parameters instead of letting the template do the linking, resulting in templates that must for all arguments whether they are valid pagenames and then link optionally) If the live count were changed to be outgoing links, then matters would be much improved - though removing the link/[[ restriction completely would be another acceptable solution. Other proposals I have seen are for (optionally) counting {{ instead of [[, but I think this is unnecessarily complicated.
The issue was the count falling out of date, not what it should include.
(In reply to comment #12) > The issue was the count falling out of date, not what it should include. > That was the issue of bug 15746 and similar bugs, but not of this one. The dropout in counting the other day is just a half of the problem. This bug is about general question - how to treat the counter.
(In reply to comment #13) > (In reply to comment #12) > > The issue was the count falling out of date, not what it should include. > > > > That was the issue of bug 15746 and similar bugs, but not of this one. The > dropout in counting the other day is just a half of the problem. This bug is > about general question - how to treat the counter. > Then the bug should actually say that in the summary or initial comment :)
(In reply to comment #11) > If the live count were changed to be outgoing links, then matters would be much > improved - though removing the link/[[ restriction completely would be another > acceptable solution. > If the count were to use the link tables rather than some criterion based on the article text, refreshing this count would be a lot easier.
(In reply to comment #2) > Checked on which random wikis in what manner? Exact SQL used would be helpful > to check we've got inconsistent data, rather than invalid assumptions about > what the statistics represent. in fact I don't see any problem. Closing as INVALID. (In reply to comment #13) > (In reply to comment #12) > > The issue was the count falling out of date, not what it should include. > > > > That was the issue of bug 15746 and similar bugs, but not of this one. The > dropout in counting the other day is just a half of the problem. This bug is > about general question - how to treat the counter. Then this doesn't seem the best place. A Meta discussion would probably be better. See also bug 24754, bug 26033.