Last modified: 2014-11-17 10:36:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T45799, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 43799 - Allow for using language-specific collations for category sorting
Allow for using language-specific collations for category sorting
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
1.21.x
All All
: Normal normal with 1 vote (vote)
: 1.21.0 release
Assigned To: Bartosz Dziewoński
: i18n
Depends on:
Blocks: 30673
  Show dependency treegraph
 
Reported: 2013-01-09 22:36 UTC by Bartosz Dziewoński
Modified: 2014-11-17 10:36 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
1 (8 bytes, text/plain)
2014-03-29 09:29 UTC, 250055655
Details

Description Bartosz Dziewoński 2013-01-09 22:36:34 UTC
We should either bundle or generate on-demand first-letters-XX.ser files.

ICUCollation itself, as well as PHP's collation, accepts language strings like 'sv' or 'pl' just fine. However, lack of corresponding first-letters file causes a cryptic exception.

If the 'root' file is copied, sorting works correctly for given language, only the headings are incorrect (default, not taking letters with diacritics like Ø or Ą into account).

The files can probably depends on Unicode tailoring data to add additional letters to create subheading for in categories. http://developer.mimer.com/charts/tailorings.htm looks like a good starting point.
Comment 1 Bawolff (Brian Wolff) 2013-01-14 00:11:23 UTC
Ive been reading up on the collation stuff (specifically uts#10 and uts#35). It seems like the best course of action is instead of generating huge first letter files for every locale (probably would need about 200 such files with that approach), use the root first letters as a base. Then for a specific locale take the index examplar characters for that locale (from cldr). If the thing we are sorting falls between the first and last index letter we use the index letter as the first letter header otherwise use the info from first-letter-root.ser.

This would probably be best accomplished by merging the index letters with root first letters during the sorting step in icucollation that happens just before things get cached.
Comment 2 Liangent 2013-01-14 15:39:45 UTC
Then whitelist those languages first. Chinese doesn't work in this way AFAIK.
Comment 3 Bawolff (Brian Wolff) 2013-01-15 22:32:46 UTC
(In reply to comment #2)
> Then whitelist those languages first. Chinese doesn't work in this way AFAIK.

Yes of course. We would definitely need testing here to see where this approach works and where it doesn't.
Comment 4 Bartosz Dziewoński 2013-02-09 21:39:42 UTC
Removing the "Bundle or generate on-demand first-letters-XX.ser files" part from summary, as on second though this seems like not the best way to do this.

We should probably just store collation tailorings as "adjustments" to the -root file.
Comment 5 Bartosz Dziewoński 2013-02-18 22:49:51 UTC
I838484b9 does that.
Comment 6 Bartosz Dziewoński 2013-02-26 20:11:39 UTC
Marking this as fixed.

While there is still a lot that could be done, this patch provides basic and pretty solid support for 67 languages using latin, cyrillic and greek alphabets.

Similar bug about Chinese collations: bug 44667.
Comment 7 Nemo 2013-02-28 10:20:39 UTC
(In reply to comment #6)
> While there is still a lot that could be done, this patch provides basic and
> pretty solid support for 67 languages using latin, cyrillic and greek
> alphabets.

It would be very nice if you could add to [[mw:MediaWiki 1.21]] a new section explaining what concretely changes effective now, and what else needs to be done on future releases or by local installations.
Release notes and commit message doesn't say anything clear, and http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/67769 mentions the need for 1) additional deployments, 2) configurations, 3) changes that look like MessagesXx.php variables and 4) maintenance/updateCollation.php... all mixed up, so I'm rather confused.
This looks like a big improvement, so we need to involve many more people for the follow-ups.
Comment 8 Bartosz Dziewoński 2013-02-28 16:10:39 UTC
I extended the [[mw:Manual:$wgCategoryCollation]] docs [1] and added a section stub, linking to aforementioned docs [2].

[1] https://www.mediawiki.org/w/index.php?title=Manual:$wgCategoryCollation&diff=653294&oldid=650251
[2] https://www.mediawiki.org/wiki/MediaWiki_1.21#Extended_collation_support

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links