Last modified: 2013-12-19 20:55:13 UTC
Quoting bug 55565 comment #11: https://fa.wikipedia.org/wiki/%D8%B1%D8%AF%D9%87:%D8%B5%D9%88%D8%B1%D8%AA_%D9%81%D9%84%DA%A9%DB%8C_%D8%A8%D8%B1%D9%87 seems all digits type on first of page title is being converted to Arabic digits. We shouldn't see '1' '2' '3' (Arabic Digits) and we should see '۱' '۲' '۳' (Persian Digits) instead. Reproducible on all categories and also on ckbwiki https://ckb.wikipedia.org/w/index.php?title=%D9%BE%DB%86%D9%84:%DA%95%DB%86%DA%98%DB%95%DA%A9%D8%A7%D9%86%DB%8C_%D8%B3%D8%A7%DA%B5&action=edit&redlink=1 that are using Arabic-Indic digits. ---- Might also affect other numeral systems, I didn't test.
Okay on bengali https://bn.wikipedia.org/wiki/%E0%A6%AC%E0%A6%BF%E0%A6%B7%E0%A6%AF%E0%A6%BC%E0%A6%B6%E0%A7%8D%E0%A6%B0%E0%A7%87%E0%A6%A3%E0%A7%80:%E0%A6%AC%E0%A6%9B%E0%A6%B0 but fails on Arabic-Indic (Eastern Arabic) and Persian
If they have same primary weight, we could just remove latin digits from first letters and add the farsi ones on a per language basis.
Yeah, that'll probably work, but I'm wondering why did it start happening now after a supposedly minor package upgrade.
allkeys.txt entries for '1' and '۱': 0031 ; [.159A.0020.0002.0031] # DIGIT ONE 06F1 ; [.159A.0020.0002.06F1][.0000.0166.0002.06F1] # EXTENDED ARABIC-INDIC DIGIT ONE Same primary weight. Trying to list each digit for each language IMO makes little sense (grepping the allkeys.txt file for "DIGIT ONE" yields 60 results). I think we could use Language#formatNum() for each digit instead and replace Latin ones with localized ones in IcuCollation#getFirstLetterData (per Brian's suggestion), after applying $tailoringFirstLetters.
Be aware, we use different unicode for digits on ckb.wiki: ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩
Change 89488 had a related patch set uploaded by Bartosz Dziewoński: IcuCollation: Sort digits under localised digits' headings https://gerrit.wikimedia.org/r/89488
Bah, and of course ckb.wp has to use the 'uca-fa' collation, because otherwise it would be too easy to fix. My patch above doesn't handle this case, because I don't see how we could do it without creating a faux collation for ckb and if()-ing it (which would be ugly), or using wiki language instead of collation language (which would be unexpected). Input welcome.
I guess we could apply the digit transformation on rendering a numeric section header in the category page, instead of in the collation. Not sure if that's really a good idea though.
There could be a "ckb" collation (i.e. not uca-ckb), class name CollationCkb which is a subclass of IcuCollation. You could have IcuCollation::getDigitTransformTable() which is overridden by the subclass. CollationCkb::__construct() would call parent::__construct('fa'). Doing it that way means that when ICU adds support for ckb, migration from ckb to uca-ckb can be done without breaking the wiki. Or if the problem is likely to be repeated with other languages, there could be some regex-based alias feature in Collation::factory(), e.g. "alias-ckb/fa", where the collation name would specify both the ICU locale and the MW locale.
Change 95867 had a related patch set uploaded by Bartosz Dziewoński: IcuCollation: Add CollationCkb subclass for Sorani Kurdish https://gerrit.wikimedia.org/r/95867
(In reply to comment #9) > There could be a "ckb" collation (i.e. not uca-ckb), class name CollationCkb > which is a subclass of IcuCollation. You could have > IcuCollation::getDigitTransformTable() which is overridden by the subclass. > CollationCkb::__construct() would call parent::__construct('fa'). I implemented this in the patch above (which depends on the previous patch, https://gerrit.wikimedia.org/r/89488). > Or if the problem is likely to be repeated with other languages, there could > be > some regex-based alias feature in Collation::factory(), e.g. "alias-ckb/fa", > where the collation name would specify both the ICU locale and the MW locale. I did not implement this, hopefully it will never be needed, because it sounds bad. :) But if we ever need it, it won't be hard to migrate.
it is still alive! https://fa.wikipedia.org/wiki/%D8%B1%D8%AF%D9%87:%D8%B5%D9%81%D8%AD%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AD%D8%B0%D9%81_%D8%B2%D9%85%D8%A7%D9%86%E2%80%8C%D8%AF%D8%A7%D8%B1
We're still working on it :) Both of my patches are waiting to be re-reviewed.
Change 89488 merged by jenkins-bot: IcuCollation: Sort digits under localised digits' headings https://gerrit.wikimedia.org/r/89488
Change 95867 merged by jenkins-bot: IcuCollation: Add CollationCkb subclass for Sorani Kurdish https://gerrit.wikimedia.org/r/95867
Change 101005 had a related patch set uploaded by Bartosz Dziewoński: (bug 55630) $wgCategoryCollation = 'xx-uca-ckb' for ckbwiki https://gerrit.wikimedia.org/r/101005
Status update: Tim merged the two patches. Thanks! * This means that category headings on fa.wikipedia and other wikis using languages with localised digits will start behaving correctly as soon as they are deployed, which will happen on 19 December (according to [[mw:MediaWiki_1.23/Roadmap]]). * ckb.wikipedia is troublesome because it's currently using a collation meant for 'fa'; my configuration patch above fixes that as well. (If it were not deployed, 'fa' digits would be used instead of 'ckb' digits.) I'll leave this open for a while longer until everything is sorted out.
Change 101005 merged by jenkins-bot: (bug 55630) $wgCategoryCollation = 'xx-uca-ckb' for ckbwiki https://gerrit.wikimedia.org/r/101005
Looking at links from comment 0, everything seems to be in order now. Thanks for the help and reports, everyone!
Thank you very much Bartosz Dziewoński.