Last modified: 2011-04-25 19:31:16 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T12834, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 10834 - site_stats.ss_good_articles and site_stats.ss_total_pages not synchronized with the real count
site_stats.ss_good_articles and site_stats.ss_total_pages not synchronized wi...
Status: RESOLVED INVALID
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
unspecified
All All
: Lowest trivial with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 15746 (view as bug list)
Depends on:
Blocks: 16660
  Show dependency treegraph
 
Reported: 2007-08-07 14:25 UTC by Danny B.
Modified: 2011-04-25 19:31 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Danny B. 2007-08-07 14:25:23 UTC
site_stats.ss_good_articles and site_stats.ss_total_pages are not synchronized with the corresponding count done by query.

site_stats.ss_total_pages != count all from page;
site_stats.ss_good_articles != count ns 0 & nonredir & not dead end from page

Checked on random wikis. In all tested cases site_stats results shown more than corresponding query.
Comment 1 Danny B. 2007-08-07 15:19:04 UTC
Add on: Tested some other wikis and got the stats lower than query. Therefore there's no presumable behavior of it.
Comment 2 Rob Church 2007-08-08 01:54:31 UTC
Checked on which random wikis in what manner? Exact SQL used would be helpful to check we've got inconsistent data, rather than invalid assumptions about what the statistics represent. You've taken issues like possible replication lag into consideration, where applicable?
Comment 3 Danny B. 2007-08-08 02:27:26 UTC
Tested on toolserver couple minutes ago. Replags on s2 and s3 were within 0-4 sec during performing these queries.

query #1> SELECT ss_total_pages, ss_good_articles FROM site_stats;
query #2> SELECT COUNT(*) AS totalpages FROM page;
query #3> SELECT COUNT(DISTINCT page_id) AS goodarticles FROM page LEFT JOIN pagelinks ON page_id = pl_from WHERE pl_from IS NOT NULL AND page_namespace = 0 AND page_is_redirect = 0;

Query #3 based on the rule "good article = page in ns 0 AND not redirect AND not dead end"; query for dead end taken from http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/SpecialDeadendpages.php?view=markup and slightly modified, gives the same result as query #2 - exact query from svn)

cswiki
+----------------+------------------+
| ss_total_pages | ss_good_articles |
+----------------+------------------+
|         187044 |            73934 |
+----------------+------------------+
+------------+
| totalpages |
+------------+
|     186931 |
+------------+
+--------------+
| goodarticles |
+--------------+
|        74123 |
+--------------+

cswikisource
+----------------+------------------+
| ss_total_pages | ss_good_articles |
+----------------+------------------+
|           4049 |             2523 |
+----------------+------------------+
+------------+
| totalpages |
+------------+
|       4049 |
+------------+
+--------------+
| goodarticles |
+--------------+
|         2464 |
+--------------+

skwiki
+----------------+------------------+
| ss_total_pages | ss_good_articles |
+----------------+------------------+
|         143111 |            73727 |
+----------------+------------------+
+------------+
| totalpages |
+------------+
|     143779 |
+------------+
+--------------+
| goodarticles |
+--------------+
|        73658 |
+--------------+
Comment 4 Danny B. 2007-08-08 02:35:19 UTC
Couple suggestions I've got on #mediawiki during today:

Tim was suggesting invalid titles
Duesentrieb was suggesting last time he saw sources the stats were checking only for "[[" string presence

I can't remember on which wiki it was, but I also got the result site_stats.ss_good_articles > COUNT(*) WHERE page_namespace = 0 AND page_is_redirect = 0 couple times.
Comment 5 Brion Vibber 2007-08-08 17:50:06 UTC
Live counts check for namespace, non-redirect, and '[['.

Re-initialized count checks only for namespace, non-redirect, and (I *think*) non-empty. It's not efficient to check for '[[' in text in a bulk query since text has to be loaded and decompressed separately.

Counts are re-initialized automatically more frequently now due to the checks for rolled-over or otherwise broken counts.

Counting pagelinks entries wouldn't necessarily give the same count as '[[' checks (interwikis, images, categories, or just plain invalid links).
Comment 6 Mashiah Davidson 2007-10-31 18:43:14 UTC
the same for ruwiki
Comment 7 Anastasya Lvova 2008-08-29 11:50:12 UTC
Achtung! Mashiah Davidson is homophob!
Comment 8 Danny B. 2008-10-08 21:14:44 UTC
Bug 15746 related.
Comment 9 Aaron Schulz 2009-01-02 19:28:27 UTC
(In reply to comment #5)
> Live counts check for namespace, non-redirect, and '[['.
> 
> Re-initialized count checks only for namespace, non-redirect, and (I *think*)
> non-empty. It's not efficient to check for '[[' in text in a bulk query since
> text has to be loaded and decompressed separately.
> 
> Counts are re-initialized automatically more frequently now due to the checks
> for rolled-over or otherwise broken counts.
> 
> Counting pagelinks entries wouldn't necessarily give the same count as '[['
> checks (interwikis, images, categories, or just plain invalid links).
> 

So is this bug still an issue?
Comment 10 Aaron Schulz 2009-01-02 19:28:46 UTC
*** Bug 15746 has been marked as a duplicate of this bug. ***
Comment 11 Conrad Irwin 2009-01-02 19:37:47 UTC
Given that many wiktionaries are putting a <!-- [[ --> into their page source for pages that only contain templates (which thus have links but no [[) to make them count, yes it is most definitely an issue. (The english wiktionary doesn't do this, instead it insists on manual links as template parameters instead of letting the template do the linking, resulting in templates that must for all arguments whether they are valid pagenames and then link optionally)

If the live count were changed to be outgoing links, then matters would be much improved - though removing the link/[[ restriction completely would be another acceptable solution.

Other proposals I have seen are for (optionally) counting {{ instead of [[, but I think this is unnecessarily complicated.

Comment 12 Aaron Schulz 2009-01-02 19:39:35 UTC
The issue was the count falling out of date, not what it should include.
Comment 13 Danny B. 2009-01-02 23:58:58 UTC
(In reply to comment #12)
> The issue was the count falling out of date, not what it should include.
> 

That was the issue of bug 15746 and similar bugs, but not of this one. The dropout in counting the other day is just a half of the problem. This bug is about general question - how to treat the counter.
Comment 14 Aaron Schulz 2009-01-03 04:56:28 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > The issue was the count falling out of date, not what it should include.
> > 
> 
> That was the issue of bug 15746 and similar bugs, but not of this one. The
> dropout in counting the other day is just a half of the problem. This bug is
> about general question - how to treat the counter.
> 

Then the bug should actually say that in the summary or initial comment :)
Comment 15 Roan Kattouw 2009-01-07 14:01:24 UTC
(In reply to comment #11)
> If the live count were changed to be outgoing links, then matters would be much
> improved - though removing the link/[[ restriction completely would be another
> acceptable solution.
>
If the count were to use the link tables rather than some criterion based on the article text, refreshing this count would be a lot easier.
Comment 16 Nemo 2011-04-25 19:31:16 UTC
(In reply to comment #2)
> Checked on which random wikis in what manner? Exact SQL used would be helpful
> to check we've got inconsistent data, rather than invalid assumptions about
> what the statistics represent.

in fact I don't see any problem. Closing as INVALID.

(In reply to comment #13)
> (In reply to comment #12)
> > The issue was the count falling out of date, not what it should include.
> > 
> 
> That was the issue of bug 15746 and similar bugs, but not of this one. The
> dropout in counting the other day is just a half of the problem. This bug is
> about general question - how to treat the counter.

Then this doesn't seem the best place. A Meta discussion would probably be better. See also bug 24754, bug 26033.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links