Last modified: 2013-01-13 19:03:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T18775, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 16775 - Number of "active users" is a rather confusing statistic: add explanation and don't rely on $wgRCMaxAge
Number of "active users" is a rather confusing statistic: add explanation and...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: Low minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-23 19:03 UTC by MZMcBride
Modified: 2013-01-13 19:03 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2008-12-23 19:03:29 UTC
I ran a query on the Toolserver to determine the number of active users:

SELECT rc_user_text FROM recentchanges;

I then ran the output through a de-dupe script and then filtered out IPv4 addresses.

The result is a 264,911-line file. http://en.wikipedia.org/wiki/Special:Statistics says that there are 153,015 active users.

Perhaps there's a flaw in my methodology? But more likely it seems we have yet another stats reporter that's broken. :-/
Comment 1 OverlordQ 2008-12-23 19:15:14 UTC
How about filtering out new accounts?
From SiteStats:

# Get non-bot users than did some recent action other than making accounts.
# If account creation is included, the number gets inflated ~20+ fold on enwiki.
Comment 2 Roan Kattouw 2008-12-24 13:20:47 UTC
(In reply to comment #0)
> I ran a query on the Toolserver to determine the number of active users:
> 
> SELECT rc_user_text FROM recentchanges;l
> 
> I then ran the output through a de-dupe script and then filtered out IPv4
> addresses.

A more useful query would probably be

SELECT DISTINCT rc_user_text FROM recentchanges WHERE rc_user != 0 AND rc_type != 3;

This filters out duplicates, IPs and log entries (can't think of a better mechanism to exclude account creations this quickly).
Comment 3 MZMcBride 2008-12-27 06:13:30 UTC
Received by pastebin from OverlordQ:

SELECT COUNT(DISTINCT rc_user_text) FROM recentchanges WHERE rc_user != 0 AND rc_bot = 0 AND (rc_log_type != 
'newusers' OR rc_log_type IS NULL);
+------------------------------+
| COUNT(DISTINCT rc_user_text) |
+------------------------------+
|                       153024 |
+------------------------------+
1 row in set (1 min 57.97 sec)

mysql> SELECT COUNT(DISTINCT rc_user_text) FROM recentchanges WHERE rc_user != 0;
+------------------------------+
| COUNT(DISTINCT rc_user_text) |
+------------------------------+
|                       264739 |
+------------------------------+
1 row in set (14.11 sec)

So, the number isn't wrong, per se. Just rather confusing... I've adjusted the bug summary accordingly. 

Possible options to add clarification: link to a MW.org page describing how the statistic is calculated (and what each piece means, how often it's refreshed, etc.). I believe the Job queue includes a link by default now in MediaWiki core.
Comment 4 Siebrand Mazeland 2009-04-25 12:00:10 UTC
Also should be able to set the 'active user period' separately from $wgRCMaxAge. I do not want 2 years of users marked as active, but I do want two years of RC available.
Comment 5 Niklas Laxström 2009-10-07 12:03:41 UTC
Besides, the number is pretty useless if it cannot be compared between two wikies. It shouldn't be hard to cap it to the current default length of recent changes, for example.
Comment 6 Umherirrender 2013-01-13 19:03:05 UTC
Since r69495 there is $wgActiveUserDays

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links