Last modified: 2013-12-19 20:55:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57630, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 55630 - When using UCA collations, Persian digits (۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹) sorted under Western Arabic digits' (0 1 2 3 4 5 6 7 8 9) headings


Summary:	When using UCA collations, Persian digits (۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹) sorted under ...

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Categories (Other open bugs)
Version:	1.22.0
Hardware:	All All

Importance:	Normal normal with 3 votes (vote)
Target Milestone:	1.23.0 release
Assigned To:	Bartosz Dziewoński

URL:
Whiteboard:
Keywords:	code-update-regression

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-10-11 16:47 UTC by Bartosz Dziewoński
Modified:	2013-12-19 20:55 UTC (History)
CC List:	9 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Bartosz Dziewoński 2013-10-11 16:47:28 UTC

Quoting bug 55565 comment #11:
https://fa.wikipedia.org/wiki/%D8%B1%D8%AF%D9%87:%D8%B5%D9%88%D8%B1%D8%AA_%D9%81%D9%84%DA%A9%DB%8C_%D8%A8%D8%B1%D9%87
seems all digits type on first of page title is being converted to Arabic
digits. We shouldn't see '1' '2' '3' (Arabic Digits) and we should see '۱' '۲'
'۳' (Persian Digits) instead. Reproducible on all categories and also on
ckbwiki
https://ckb.wikipedia.org/w/index.php?title=%D9%BE%DB%86%D9%84:%DA%95%DB%86%DA%98%DB%95%DA%A9%D8%A7%D9%86%DB%8C_%D8%B3%D8%A7%DA%B5&action=edit&redlink=1
that are using Arabic-Indic digits.

----

Might also affect other numeral systems, I didn't test.

Comment 1 [no longer active user] 2013-10-11 16:50:06 UTC

Okay on bengali https://bn.wikipedia.org/wiki/%E0%A6%AC%E0%A6%BF%E0%A6%B7%E0%A6%AF%E0%A6%BC%E0%A6%B6%E0%A7%8D%E0%A6%B0%E0%A7%87%E0%A6%A3%E0%A7%80:%E0%A6%AC%E0%A6%9B%E0%A6%B0 but fails on Arabic-Indic (Eastern Arabic) and Persian

Comment 2 Bawolff (Brian Wolff) 2013-10-11 16:51:48 UTC

If they have same primary weight, we could just remove latin digits from first letters and add the farsi ones on a per language basis.

Comment 3 Bartosz Dziewoński 2013-10-11 16:58:26 UTC

Yeah, that'll probably work, but I'm wondering why did it start happening now after a supposedly minor package upgrade.

Comment 4 Bartosz Dziewoński 2013-10-12 22:01:33 UTC

allkeys.txt entries for '1' and '۱':

0031  ; [.159A.0020.0002.0031] # DIGIT ONE
06F1  ; [.159A.0020.0002.06F1][.0000.0166.0002.06F1] # EXTENDED ARABIC-INDIC DIGIT ONE

Same primary weight.


Trying to list each digit for each language IMO makes little sense (grepping the allkeys.txt file for "DIGIT ONE" yields 60 results).

I think we could use Language#formatNum() for each digit instead and replace Latin ones with localized ones in IcuCollation#getFirstLetterData (per Brian's suggestion), after applying $tailoringFirstLetters.

Comment 5 Calak 2013-10-12 22:16:31 UTC

Be aware, we use different unicode for digits on ckb.wiki:
٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩

Comment 6 Gerrit Notification Bot 2013-10-12 22:29:26 UTC

Change 89488 had a related patch set uploaded by Bartosz Dziewoński:
IcuCollation: Sort digits under localised digits' headings

https://gerrit.wikimedia.org/r/89488

Comment 7 Bartosz Dziewoński 2013-10-12 22:31:45 UTC

Bah, and of course ckb.wp has to use the 'uca-fa' collation, because otherwise it would be too easy to fix.

My patch above doesn't handle this case, because I don't see how we could do it without creating a faux collation for ckb and if()-ing it (which would be ugly), or using wiki language instead of collation language (which would be unexpected). Input welcome.

Comment 8 Bawolff (Brian Wolff) 2013-10-13 00:06:55 UTC

I guess we could apply the digit transformation on rendering a numeric section header in the category page, instead of in the collation. Not sure if that's really a good idea though.

Comment 9 Tim Starling 2013-10-28 05:50:24 UTC

There could be a "ckb" collation (i.e. not uca-ckb), class name CollationCkb which is a subclass of IcuCollation. You could have IcuCollation::getDigitTransformTable() which is overridden by the subclass. CollationCkb::__construct() would call parent::__construct('fa').

Doing it that way means that when ICU adds support for ckb, migration from ckb to uca-ckb can be done without breaking the wiki.

Or if the problem is likely to be repeated with other languages, there could be some regex-based alias feature in Collation::factory(), e.g. "alias-ckb/fa", where the collation name would specify both the ICU locale and the MW locale.

Comment 10 Gerrit Notification Bot 2013-11-17 15:06:10 UTC

Change 95867 had a related patch set uploaded by Bartosz Dziewoński:
IcuCollation: Add CollationCkb subclass for Sorani Kurdish

https://gerrit.wikimedia.org/r/95867

Comment 11 Bartosz Dziewoński 2013-11-17 15:07:54 UTC

(In reply to comment #9)
> There could be a "ckb" collation (i.e. not uca-ckb), class name CollationCkb
> which is a subclass of IcuCollation. You could have
> IcuCollation::getDigitTransformTable() which is overridden by the subclass.
> CollationCkb::__construct() would call parent::__construct('fa').

I implemented this in the patch above (which depends on the previous patch, https://gerrit.wikimedia.org/r/89488).


> Or if the problem is likely to be repeated with other languages, there could
> be
> some regex-based alias feature in Collation::factory(), e.g. "alias-ckb/fa",
> where the collation name would specify both the ICU locale and the MW locale.

I did not implement this, hopefully it will never be needed, because it sounds bad. :) But if we ever need it, it won't be hard to migrate.

Comment 12 reza1615 2013-11-22 08:17:38 UTC

it is still alive!
https://fa.wikipedia.org/wiki/%D8%B1%D8%AF%D9%87:%D8%B5%D9%81%D8%AD%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AD%D8%B0%D9%81_%D8%B2%D9%85%D8%A7%D9%86%E2%80%8C%D8%AF%D8%A7%D8%B1

Comment 13 Bartosz Dziewoński 2013-11-22 16:22:50 UTC

We're still working on it :) Both of my patches are waiting to be re-reviewed.

Comment 14 Gerrit Notification Bot 2013-12-12 04:45:15 UTC

Change 89488 merged by jenkins-bot:
IcuCollation: Sort digits under localised digits' headings

https://gerrit.wikimedia.org/r/89488

Comment 15 Gerrit Notification Bot 2013-12-12 04:49:49 UTC

Change 95867 merged by jenkins-bot:
IcuCollation: Add CollationCkb subclass for Sorani Kurdish

https://gerrit.wikimedia.org/r/95867

Comment 16 Gerrit Notification Bot 2013-12-12 15:55:21 UTC

Change 101005 had a related patch set uploaded by Bartosz Dziewoński:
(bug 55630) $wgCategoryCollation = 'xx-uca-ckb' for ckbwiki

https://gerrit.wikimedia.org/r/101005

Comment 17 Bartosz Dziewoński 2013-12-12 16:01:45 UTC

Status update: Tim merged the two patches. Thanks!

* This means that category headings on fa.wikipedia and other wikis
  using languages with localised digits will start behaving correctly
  as soon as they are deployed, which will happen on 19 December
  (according to [[mw:MediaWiki_1.23/Roadmap]]).
* ckb.wikipedia is troublesome because it's currently using a
  collation meant for 'fa'; my configuration patch above fixes that as
  well. (If it were not deployed, 'fa' digits would be used instead of
  'ckb' digits.)

I'll leave this open for a while longer until everything is sorted out.

Comment 18 Gerrit Notification Bot 2013-12-19 19:03:41 UTC

Change 101005 merged by jenkins-bot:
(bug 55630) $wgCategoryCollation = 'xx-uca-ckb' for ckbwiki

https://gerrit.wikimedia.org/r/101005

Comment 19 Bartosz Dziewoński 2013-12-19 20:46:39 UTC

Looking at links from comment 0, everything seems to be in order now. Thanks for the help and reports, everyone!

Comment 20 Calak 2013-12-19 20:55:13 UTC

Thank you very much Bartosz Dziewoński.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links