Last modified: 2011-12-19 18:36:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T29774, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 27774 - Username to user_id match is inconsistent in revisions of dump.
Username to user_id match is inconsistent in revisions of dump.
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal minor (vote)
: ---
Assigned To: Ariel T. Glenn
: analytics
Depends on:
Blocks: 27772
  Show dependency treegraph
 
Reported: 2011-02-28 00:11 UTC by Diederik van Liere
Modified: 2011-12-19 18:36 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Diederik van Liere 2011-02-28 00:11:45 UTC
Username to user_id match is inconsistent in revisions of dump.  
This could be a characteristic of how and when the username field gets updated in the revision table.  If so, it would be nice to have a clear explanation of what things to expect (e.g. deleted users, username changes, etc).
We see a range of inconsistencies along the lines of many usernames matched with the same ID, many IDs matched with the same username, non-ip usernames with no ID and completely missing user information.
Our approach is to associate a user_id with its most recent username and propagate this username to all instances of user_id.

Proposed solution:

1) Run SQL query to synchronize usernames with userids.
2) Run SQL query to replace cases hostname is the username.
Comment 1 Brion Vibber 2011-03-05 00:32:05 UTC
The dump will just list whatever's in the table; each revision record carries both a rev_user and a rev_user_text field. Depending on how imports, user renames, and database cleanup proceed over time, it's normal to have some inconsistencies in there:

* with user_id of 0, you will sometimes see non-IP user names that elsewhere appear with a legit user id
* sometimes you will find a legit user_id paired with an old username

You should consider the username listed in a revision record to be merely advisory when a non-zero user ID is present. Current user information ought in theory to be available in a more canonical form somewhere else...
Comment 2 db [inactive,noenotif] 2011-12-19 18:36:06 UTC
see bug 31863

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links