Last modified: 2013-03-23 00:19:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47970, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 45970 - Make updateCollation.php process categorylinks on a category-by-category basis
Make updateCollation.php process categorylinks on a category-by-category basis
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
1.21.x
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: collations
  Show dependency treegraph
 
Reported: 2013-03-10 22:54 UTC by Bartosz Dziewoński
Modified: 2013-03-23 00:19 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bartosz Dziewoński 2013-03-10 22:54:13 UTC
I'm not sure what's the order right now, but categories were surely messed up on pl.wiki during its migration to uca-pl (bug 42413); collation, so it's not by category.

Apparently nobody noticed but me (or cared enough to report it), but the problem existed. The migration took ~25 hours.

Getting this done will be necessary if we ever want to get, say, Commons or English Wikipedia migrated with reasonably little disruption; doing this many categorylinks might take weeks, and somebody is bound to notice and get mad.
Comment 1 Liangent 2013-03-11 03:04:26 UTC
Somebody already noticed it. :) See Change-Id: Ibcf789314d02a68157063e11dcdf81abfb9d61fb

Copying my comment in that change: I find that there's no index cl_to,cl_from (there is cl_to,cl_type,cl_sortkey,cl_from but cl_sortkey is not stable when this script is running), and I don't think it worth adding a new index only for this reason.
Comment 2 Bawolff (Brian Wolff) 2013-03-11 18:20:28 UTC
Even in the case where the second index is unstable, we could still use it in the case that we know which cl_collation to ignore, at the cost of a couple rows being looked at twice (specificly that could work in the non --force case currently. Im not sure if such a scheme could work with the new multi collation stuff though)
Comment 3 Sam Reed (reedy) 2013-03-11 18:47:17 UTC
Will this actually work as expected? Having $wgCategoryCollation set to the old value would mean new additions get added with the old flag. Having it on the new one will mean new members to the category may or may not match that of the current members.

Is this really any better/different/whatever to how we are proceeding now?
Comment 4 Bawolff (Brian Wolff) 2013-03-11 19:57:01 UTC
It would mean that only newly created category links would potentially be ordered incorrectly. Right now those are a minority of pages ordered incorrectly. Currently we go oldest pages first, which makes categories almost entirely broken for the duration of the script running.
Comment 5 Bartosz Dziewoński 2013-03-23 00:19:38 UTC
Done by Tim in I19bc8d67.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links