Last modified: 2014-09-23 19:47:53 UTC
Right now, RenameUser relies on lobbing jobs into the job queue. However, the job queue is not designed to handle tasks in a reliable, ordered manner. RenameUser is complicated because of the way we typically pull the user name from the database. There are several places where we pull denormalized values for the user_text field in several tables (revision, archive, etc). If we actually go to the source (user_name field in the user table), then renaming would be a much cheaper and more robust operation. Examples of this cleanup are r100286 and r100300. Aaron has done some of this work, but would like help.
This may benefit from a tweak to internal APIs. Revision::getUserText() / Revision::getRawUserText() currently pulls from the rev_user_text field (unless it got overridden by a magic coalescy thingy in the row). This means that anything pulling its own queries may be missing the original names, as it'll be stuck with rev_user_text. If joined columns from 'user' are available when initializing the Revision object from a row, then we should use that directly; but if not, we could do an on-demand lookup via the rev_user_id if it's non-zero (local user reference), or keep the rev_user_text if it's zero (usually IP, sometimes named non-local import markers). With that in place, the worst case scenario should be that some batch queries might be missing the join and end up doing some more row-by-row lookups (they'll probably already be doing lots of those for user/talk page existence checks, so don't worry!)... but they'll show the correct results. Might also think about a Revision::getUserObj() or something that would hand back a fully-ready User object, rather than having to cart around (id, text) pairs all the time.
(In reply to comment #1) > If joined columns from 'user' are available when initializing the Revision > object from a row, then we should use that directly; but if not, we could do an > on-demand lookup via the rev_user_id if it's non-zero (local user reference), > or keep the rev_user_text if it's zero (usually IP, sometimes named non-local > import markers). Note that the "magic coalescy thingy" was replaced with just checking user_name already ;)
(In reply to comment #1) > With that in place, the worst case scenario should be that some batch queries > might be missing the join and end up doing some more row-by-row lookups > (they'll probably already be doing lots of those for user/talk page existence > checks, so don't worry!)... but they'll show the correct results. > Basically done in r100475.
Adding Yuvi to this bug since he said he'd take a look at this.
Still lots of places that need JOINs or, preferably, batch lookups.
Do we have a list of these anywhere? We need to do renames in the very near future, and this would make it much easier.
The places MediaWiki core currently actively looks at a user_text column that isn't from the user table are listed here: * https://github.com/wikimedia/mediawiki-extensions-Renameuser/blob/REL1_22/RenameuserSQL.php#L67-L90 * https://github.com/wikimedia/mediawiki-extensions-Renameuser/blob/REL1_22/renameUserCleanup.php#L149-L155 * https://github.com/wikimedia/mediawiki-extensions-Renameuser/blob/REL1_22/RenameUserJob.php#L55-L92 As of writing: * revision . rev_user_text * archive . ar_user_text * logging . log_user_text * image . img_user_text * oldimage . oi_user_text * filearchive . fa_user_text * recentchanges . rc_user_text
Extensions (especially WMF used ones) need auditing for this too...