Last modified: 2014-11-04 18:15:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21311, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19311 - User edit counts (user.user_editcount field) is often wrong
User edit counts (user.user_editcount field) is often wrong
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
All All
: Low normal with 7 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: analytics
Depends on:
Blocks: 16660
  Show dependency treegraph
Reported: 2009-06-20 17:00 UTC by MZMcBride
Modified: 2014-11-04 18:15 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description MZMcBride 2009-06-20 17:00:09 UTC
Example user: [[User:Joao]]. On the Toolserver's copy of enwiki_p:

mysql> SELECT user.user_editcount FROM user WHERE user_name="Joao"\G
*************************** 1. row ***************************
user_editcount: 266
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM revision WHERE rev_user_text = "Joao" GROUP BY rev_user_text\G
*************************** 1. row ***************************
COUNT(*): 265
1 row in set (0.03 sec)

mysql> SELECT COUNT(*) FROM archive WHERE ar_user_text = "Joao" GROUP BY ar_user_text\G
*************************** 1. row ***************************
COUNT(*): 35
1 row in set (0.01 sec)

This isn't an anomaly. Many users, esp. users with higher edit counts, have inaccurate values stored. The values don't match the number of deleted or live contributions.

Part of the problem seems to stem from the fact that the initEditCount.php maintenance script doesn't account for deleted contributions.

We're currently advertising an edit count (in Special:Preferences and elsewhere) that isn't accurate.
Comment 1 Christian Thiele 2009-08-22 00:45:36 UTC
I think there are at least two problem, which generate the difference between the "normal" edit counters on the toolserver and the user_editcount field. The first thing seems to be the problem with deleted edits. As the poster of this bug writes, the initEditCount.php doesn't account for deleted contributions. This is the correct behavior, as all edit counters don't count these. But after initializing user_editcount, only incEditCount() in User.php seems to be called, which increases user_editcount. But when a page is deleted, user_editcount is not decreased. So user_editcount is the number of all edits a user did (deleted and not deleted) minus all deleted edits up to the time, initEditCount() was called.

The second thing is an older bug, which results in having deleted revisions in the revisions table, which should be in the archive table. Therefore all edit counters check, if the rev_page id exists in the page table (this is from de.wikipedia):

SELECT count(*) FROM revision WHERE rev_user=10276;
-> 39702
SELECT count(*) FROM revision, page WHERE rev_user=10276 AND rev_page=page_id;
-> 39688

The 14 edits are from 2005/2006.
SELECT * FROM revision WHERE rev_user=10276 AND rev_page NOT IN(SELECT page_id FROM page);

I don't know if this bug exists anymore, but it doesn't seem so, because the last one for me was from March 2006. These were newly created redirects (mostly by moving a page), which were deleted later, but the moving message wasn't moved to archive. Because I think the bug was fixed, maybe a maintenance script would be good, moving all revisions with a rev_page id, which is not in the page table to the archive table.
Comment 2 X! 2010-04-05 01:36:01 UTC
It may be quite possible to...

a) create a maintenance script that replaces every user_editcount field with the result of SELECT COUNT(*) AS count FROM revision WHERE rev_user_text = 'Example';
b) set the function in the User class which gets the edit count to just do that SQL query. 

However, for users with a large number of edits, this is very slow. This may be out of our reach. Might this be possible?
Comment 3 MZMcBride 2010-04-05 01:42:06 UTC
(In reply to comment #2)
> a) create a maintenance script that replaces every user_editcount field with
> the result of SELECT COUNT(*) AS count FROM revision WHERE rev_user_text =
> 'Example';
This is essentially what initEditCount.php does:

> b) set the function in the User class which gets the edit count to just do that
> SQL query. 
Way too expensive. Even with the index on rev_user_text, you're talking about millions of rows with some of these users. The value must be stored so that it can be easily retrieved for things like creating the 'edit' links or not (autoconfirm checks this field). There might be other creative ways of updating it, though, like every time a user logs in.
Comment 4 Liangent 2013-01-11 17:27:08 UTC

[[mw:Manual:User table]]:


    Count of edits and edit-like actions.
    *NOT* intended to be an accurate copy of COUNT(*) WHERE rev_user=user_id. May contain NULL for old accounts if batch-update scripts haven't been run, as well as listing deleted edits and other myriad ways it could be out of sync. Execute the script initEditCount.php to update this table column.
    Meant primarily for heuristic checks to give an impression of whether the account has been used much.
Comment 5 Bawolff (Brian Wolff) 2013-01-11 18:01:30 UTC
(In reply to comment #4)

I don't think this is invalid. Just because its not perfect now doesn't mean we can't do better.

But first of all perhaps we should add "approximently" to the edit counter on prefs

Note You need to log in before you can comment on or make changes to this bug.