Last modified: 2014-11-17 10:36:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T45799, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 43799 - Allow for using language-specific collations for category sorting


Summary:	Allow for using language-specific collations for category sorting

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Categories (Other open bugs)
Version:	1.21.x
Hardware:	All All

Importance:	Normal normal with 1 vote (vote)
Target Milestone:	1.21.0 release
Assigned To:	Bartosz Dziewoński

URL:
Whiteboard:
Keywords:	i18n

Depends on:
Blocks:	30673
	Show dependency tree / graph

Reported:	2013-01-09 22:36 UTC by Bartosz Dziewoński
Modified:	2014-11-17 10:36 UTC (History)
CC List:	9 users (show)

See Also:	44667 45522
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
1 (8 bytes, text/plain) 2014-03-29 09:29 UTC, 250055655	Details
Add an attachment (proposed patch, testcase, etc.)

Description Bartosz Dziewoński 2013-01-09 22:36:34 UTC

We should either bundle or generate on-demand first-letters-XX.ser files.

ICUCollation itself, as well as PHP's collation, accepts language strings like 'sv' or 'pl' just fine. However, lack of corresponding first-letters file causes a cryptic exception.

If the 'root' file is copied, sorting works correctly for given language, only the headings are incorrect (default, not taking letters with diacritics like Ø or Ą into account).

The files can probably depends on Unicode tailoring data to add additional letters to create subheading for in categories. http://developer.mimer.com/charts/tailorings.htm looks like a good starting point.

Comment 1 Bawolff (Brian Wolff) 2013-01-14 00:11:23 UTC

Ive been reading up on the collation stuff (specifically uts#10 and uts#35). It seems like the best course of action is instead of generating huge first letter files for every locale (probably would need about 200 such files with that approach), use the root first letters as a base. Then for a specific locale take the index examplar characters for that locale (from cldr). If the thing we are sorting falls between the first and last index letter we use the index letter as the first letter header otherwise use the info from first-letter-root.ser.

This would probably be best accomplished by merging the index letters with root first letters during the sorting step in icucollation that happens just before things get cached.

Comment 2 Liangent 2013-01-14 15:39:45 UTC

Then whitelist those languages first. Chinese doesn't work in this way AFAIK.

Comment 3 Bawolff (Brian Wolff) 2013-01-15 22:32:46 UTC

(In reply to comment #2)
> Then whitelist those languages first. Chinese doesn't work in this way AFAIK.

Yes of course. We would definitely need testing here to see where this approach works and where it doesn't.

Comment 4 Bartosz Dziewoński 2013-02-09 21:39:42 UTC

Removing the "Bundle or generate on-demand first-letters-XX.ser files" part from summary, as on second though this seems like not the best way to do this.

We should probably just store collation tailorings as "adjustments" to the -root file.

Comment 5 Bartosz Dziewoński 2013-02-18 22:49:51 UTC

I838484b9 does that.

Comment 6 Bartosz Dziewoński 2013-02-26 20:11:39 UTC

Marking this as fixed.

While there is still a lot that could be done, this patch provides basic and pretty solid support for 67 languages using latin, cyrillic and greek alphabets.

Similar bug about Chinese collations: bug 44667.

Comment 7 Nemo 2013-02-28 10:20:39 UTC

(In reply to comment #6)
> While there is still a lot that could be done, this patch provides basic and
> pretty solid support for 67 languages using latin, cyrillic and greek
> alphabets.

It would be very nice if you could add to [[mw:MediaWiki 1.21]] a new section explaining what concretely changes effective now, and what else needs to be done on future releases or by local installations.
Release notes and commit message doesn't say anything clear, and http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/67769 mentions the need for 1) additional deployments, 2) configurations, 3) changes that look like MessagesXx.php variables and 4) maintenance/updateCollation.php... all mixed up, so I'm rather confused.
This looks like a big improvement, so we need to involve many more people for the follow-ups.

Comment 8 Bartosz Dziewoński 2013-02-28 16:10:39 UTC

I extended the [[mw:Manual:$wgCategoryCollation]] docs [1] and added a section stub, linking to aforementioned docs [2].

[1] https://www.mediawiki.org/w/index.php?title=Manual:$wgCategoryCollation&diff=653294&oldid=650251
[2] https://www.mediawiki.org/wiki/MediaWiki_1.21#Extended_collation_support

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links