Last modified: 2011-12-19 18:36:06 UTC
Username to user_id match is inconsistent in revisions of dump. This could be a characteristic of how and when the username field gets updated in the revision table. If so, it would be nice to have a clear explanation of what things to expect (e.g. deleted users, username changes, etc). We see a range of inconsistencies along the lines of many usernames matched with the same ID, many IDs matched with the same username, non-ip usernames with no ID and completely missing user information. Our approach is to associate a user_id with its most recent username and propagate this username to all instances of user_id. Proposed solution: 1) Run SQL query to synchronize usernames with userids. 2) Run SQL query to replace cases hostname is the username.
The dump will just list whatever's in the table; each revision record carries both a rev_user and a rev_user_text field. Depending on how imports, user renames, and database cleanup proceed over time, it's normal to have some inconsistencies in there: * with user_id of 0, you will sometimes see non-IP user names that elsewhere appear with a legit user id * sometimes you will find a legit user_id paired with an old username You should consider the username listed in a revision record to be merely advisory when a non-zero user ID is present. Current user information ought in theory to be available in a more canonical form somewhere else...
see bug 31863