Last modified: 2014-09-23 19:47:53 UTC
Right now, RenameUser relies on lobbing jobs into the job queue. However, the job queue is not designed to handle tasks in a reliable, ordered manner.
RenameUser is complicated because of the way we typically pull the user name from the database. There are several places where we pull denormalized values for the user_text field in several tables (revision, archive, etc). If we actually go to the source (user_name field in the user table), then renaming would be a much cheaper and more robust operation.
Examples of this cleanup are r100286 and r100300. Aaron has done some of this work, but would like help.
This may benefit from a tweak to internal APIs.
Revision::getUserText() / Revision::getRawUserText() currently pulls from the rev_user_text field (unless it got overridden by a magic coalescy thingy in the row). This means that anything pulling its own queries may be missing the original names, as it'll be stuck with rev_user_text.
If joined columns from 'user' are available when initializing the Revision object from a row, then we should use that directly; but if not, we could do an on-demand lookup via the rev_user_id if it's non-zero (local user reference), or keep the rev_user_text if it's zero (usually IP, sometimes named non-local import markers).
With that in place, the worst case scenario should be that some batch queries might be missing the join and end up doing some more row-by-row lookups (they'll probably already be doing lots of those for user/talk page existence checks, so don't worry!)... but they'll show the correct results.
Might also think about a Revision::getUserObj() or something that would hand back a fully-ready User object, rather than having to cart around (id, text) pairs all the time.
(In reply to comment #1)
> If joined columns from 'user' are available when initializing the Revision
> object from a row, then we should use that directly; but if not, we could do an
> on-demand lookup via the rev_user_id if it's non-zero (local user reference),
> or keep the rev_user_text if it's zero (usually IP, sometimes named non-local
> import markers).
Note that the "magic coalescy thingy" was replaced with just checking user_name already ;)
(In reply to comment #1)
> With that in place, the worst case scenario should be that some batch queries
> might be missing the join and end up doing some more row-by-row lookups
> (they'll probably already be doing lots of those for user/talk page existence
> checks, so don't worry!)... but they'll show the correct results.
Basically done in r100475.
Adding Yuvi to this bug since he said he'd take a look at this.
Still lots of places that need JOINs or, preferably, batch lookups.
Do we have a list of these anywhere? We need to do renames in the very near future, and this would make it much easier.
The places MediaWiki core currently actively looks at a user_text column that isn't from the user table are listed here:
As of writing:
* revision . rev_user_text
* archive . ar_user_text
* logging . log_user_text
* image . img_user_text
* oldimage . oi_user_text
* filearchive . fa_user_text
* recentchanges . rc_user_text
Extensions (especially WMF used ones) need auditing for this too...