Last modified: 2012-11-06 11:40:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21919, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19919 - Article count in Special:Statistics incorrect
Article count in Special:Statistics incorrect
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
All All
: Low major with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
Blocks: 16660
  Show dependency treegraph
Reported: 2009-07-25 03:56 UTC by Purodha Blissenbach
Modified: 2012-11-06 11:40 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Purodha Blissenbach 2009-07-25 03:56:42 UTC
In kshwiki, we seem to have an issue with the article count.
 10635 non-redirect pages in the `pages` table
 10596 pages according to query in the maintenance script [1]
  9972 shown in Special:Statistics


Here are queries made on the toolserver data base, with results:

mysql> use kshwiki_p ;
mysql> SELECT count( `page_id` ) FROM `page` \
       WHERE ( `page_namespace` ) = 0 \
         AND ( `page_is_redirect` = 0 ) ; 
| count( `page_id` ) |
|              10635 | 
1 row in set (10.90 sec)

mysql> SELECT COUNT(DISTINCT page_namespace, page_title) \
           AS pagecount FROM `page` , `pagelinks` \
       WHERE `pl_from` = `page_id` AND `page_namespace` = 0 \
         AND `page_is_redirect` = 0 AND `page_len` > 0 ;
| pagecount |
|     10596 | 
1 row in set (2.93 sec)

mysql> SELECT `ss_good_articles` FROM `site_stats` ;
| ss_good_articles |
|             9972 | 
1 row in set (0.00 sec)

Imho, the difference is likely at least in part caused by a
software update. The old parser accepted a some comments inside
redirect pages, which the new parser does not. Thus, some
existing such pages were not included in the good pages count
with the old parser. Now, when we detect them, we correct them,
the new parser sees a non-redirect becoming a redirect, and it
decrements the count of good pages. This may well have happened
~620 times, at least it appers to be a very reasonable figure.

Another issue which I observed several months ago and failed
to report: in order to duplicate two pages, including their edit
history for a page split, either I exported and re-imported
them with new page titles, or I exported, renamed, and reimported them with the original page titles. This did not
increase the page count.
Comment 1 Purodha Blissenbach 2009-07-31 04:23:37 UTC
Now, the "active users" count in the ksh Wikipedia became -1,
while the "good articles" were 106xx, for a short time, at least.

Indeed, the above diagnosis about the parser difference, and its
results are correct. With the problem diagnosed, and a newly made
"redirect" pywikipediabot page generator available,
we currently run bot fixing all those problem redirects.
It made the "good article" counter fall below zero some time around
the transit from July 31th to August 1st (UTC) +/- an hour, I believe.

This seems to have caused the "good articles" counter to be
re-evaluated, and the "active users" count to be set to -1 at the
same time.

(It ist neither useful, nor necessary, to correct statistics manually
while the bot is still running, I'll file an extra bug, when it is done)
Comment 2 Purodha Blissenbach 2009-08-09 11:35:15 UTC
See also bug 20017
See also bug 10834
Comment 3 Purodha Blissenbach 2009-08-09 12:26:28 UTC
Practical impact on kshwiki is addressed in Bug 20143 now,
once the bot fixed a the problematic pages.

This may mean that this bug should be closed, but a possibly better solution was to add the fix to the site update script, when switching
to the new parser.

Here is the bots command line:

 python pywikipedia/ -v -pt:6 -log -regex -nocase -always \
    -redirectonly:! -query:500 -summary:"Fix redirects for new parser" \
    '^(#redirect[^]|]+)\|' '\1]]\n\n|'

Note: This commandline does not remove comments from the 1st argument
      of the redirect. We did not have any, but they may cause trouble,
      too. The same holds for comments between "redirect" and the opening
      "[[" of the redirect target.

Note also: This needs to be adapted to each localized versions + the
      generic one of the magic word "redirect" for wikis that have any.
Comment 4 Antoine "hashar" Musso (WMF) 2011-06-03 13:43:29 UTC
r88250 changes Special:Undelete to "Only increment the page count if the page has been created; also simplified a bit the code".

Might help fixing the issue :)
Comment 5 Nemo 2012-11-06 11:40:25 UTC
It now says 2,626 articles vs. 2681 found by a Toolserver query: I'm calling this fixed by the several recent counting method fixes.
Please open new more specific bugs if other issues arise/stay.

Note You need to log in before you can comment on or make changes to this bug.