Last modified: 2014-10-11 11:58:24 UTC
Chinese collation is complex, not least because different Chinese-speaking regions have different customary collations. The KangXi order favoured by Unicode standards is rarely used in any region, except in dictionaries. Liangent has prepared a core branch which allows multiple category collations to coexist on a single wiki, with selectable sort order on category pages. I helped develop the architecture. One of the collations considered essential for Chinese wikis is a latin sort of the pinyin transliteration. We will need to upgrade to ICU 4.8 to support this collation. This bug tracks the tasks needed for merge of the chinese-collation branch and the deployment of a suitable multi-collation configuration on the Chinese Wikipedia.
I94056ca2
Tagging with "design" keyword, since there's a small dropdown that might use a little love.
According to Jared Zimmerman this morning, Pau did a design review of this recently and everything seemed ok (or, was going to be ok, or similar). Is that right, Liangent?
(In reply to comment #3) > According to Jared Zimmerman this morning, Pau did a design review of this > recently and everything seemed ok (or, was going to be ok, or similar). Is > that > right, Liangent? Right
(In reply to comment #4) > (In reply to comment #3) > > According to Jared Zimmerman this morning, Pau did a design review of this > > recently and everything seemed ok (or, was going to be ok, or similar). Is > > that > > right, Liangent? > > Right Great, thanks for confirming, Liangent. I'd like this bug to have the needed next steps in it; could you tell me what you think they are from your end, Liangent (and anyone else who sees this bug mail). Would love to know what needs to be prioritized.
We have libicu48 version 4.8.1.1-3, so presumably nothing else needs doing to that extent. (In reply to comment #5) > Would love to know what needs to be prioritized. The merge commit needs the various merge conflicts fixing, and also rebasing again onto core, as it's at least 10 weeks old, if not 18 weeks or so
(In reply to comment #6) > We have libicu48 version 4.8.1.1-3, so presumably nothing else needs doing to > that extent. > > (In reply to comment #5) > > Would love to know what needs to be prioritized. > > The merge commit needs the various merge conflicts fixing, and also rebasing > again onto core, as it's at least 10 weeks old, if not 18 weeks or so Alright, then I'm assuming Pau is no longer actively working on this, so assigning to Liangent but that's only because they own the merge. Would love someone else on this CC: list to take a look at that merge.
I took a quick look through the code (I was using it to make a prototype of a feature idea: http://tools.wmflabs.org/bawolff/whichisbetter ). It works well. A couple things I noticed though (Note I did not read the code in depth): *In Title::moveTo, the code seems to assume the cl_sortkey_prefix is the same for all collations. I do not think this is the case. *When running update.php, the script runs updateCollation.php before doing the schema changes from your code, instead of after. (Really I think it should run it after all extension schema change, in case someone abuses the Collation framework, to make a collation that depends on a schema change). Arguably this issue was here before your code. *From a language perspective, I think using the phrase "Sorting method" instead of "collation" for the message 'category-collation' would be better and less jargony. Some of the collation names ( 'Identity' ) are a bit jargony as well, but I guess that can't really be helped. We can't exactly use the word 'alphabetical', since they're all alphabetical. *On category pages, <label for="mw-collation-select">Sorting method:</label> should have an id or class attribute so people could style it easily. Additionally I think it might look better with the css vertical-align: bottom. I'd submit gerrit patches for some of these, but I'm kind of unclear how to do that/should I do that given I don't really understand how long-term feature branches in gerrit are supposed to work. Should I just submit new patches to the chinese-collation branch?
> *On category pages, <label for="mw-collation-select">Sorting method:</label> > should have an id or class attribute so people could style it easily. Actually, I guess its pretty easy to style via #mw-collation-selector label
For completeness's sake: Liangent will be meeting up with the WMF Language team at Wikimania this year to go over what needs to be done/etc for this to go out. Please feel free to continue working on this before then, but there is no set deploy target date until after Wikimania.
Pasting feedback that was given by Pau Giner on 2013-05-24 after a request from Tim Starling. "I made a review of the UI and provided some design ideas to solve potential issues. I'm not familiar with Chinese nor Chinese collation methods, so feel free to correct me if I made any wrong assumption in my analysis: * The use of technical linguistic term such as "collation" although correct may be confusing to regular users. "Sorting" seems a more common term that will allow to unify sorting-related options (more on this later). * The control breaks the heading layout in the current position. The line of the heading appears broken. To avoid this, I would move the selector below the heading line since the action affects the elements below the header. * Current ordering is communicated by the list itself, so we may consider making the selector more compact (e.g., using an icon with a clarification tooltip). * Not sure if this was considered, but if there is a collation method that is most commonly used it should be the default. It may be also interesting to remember which is the collation method the user selected last and use it as the default for the user. I know that the specific purpose of the extension is to support Chinese collation, but my concern is that when combining many different extensions the resulting UI gets inconsistently crowded, making it hard to access the great functionality provided by each individual extension. To avoid this, I would propose to create a unified entry point for sorting-like functionality that can be used consistently at different parts of the UI. I made a quick mockup to illustrate the idea: http://i.imgur.com/1uZD8nF.png "
>* The control breaks the heading layout in the current position. The line of >the heading appears broken. To avoid this, I would move the selector below the >heading line since the action affects the elements below the header. Just as a note, it affects the elements below the next 3 headings, not just the heading it is beside. >Not sure if this was considered, but if there is a collation method that is >most commonly used it should be the default. It may be also interesting to >remember which is the collation method the user selected last and use it as the >default for the user. Given that Liangent added a user preference for preferred sorting, this seems like a good idea to maybe make altering which sorting method was used change that preference. The only possible worry I would have is in the case of {{DEFAULTCOLLATION:...}} being specified, the interaction between remembering the user's last choice, and the collation being overridden on a per-page basis, might be unclear to the user. But I think that's a minor concern.
What's the progress on this?
Reedy has just refurbished https://gerrit.wikimedia.org/r/#/c/87288/ which cherry-picks the first commit from the branch onto master, and I think he's working on the following patches. Tim, any chance of technical/performance review from you? :)