Last modified: 2013-04-22 16:16:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43040, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41040 - Proper collation support in categories for Ukrainian wikis
Proper collation support in categories for Ukrainian wikis
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
unspecified
All All
: High enhancement (vote)
: ---
Assigned To: Bartosz Dziewoński
: i18n
Depends on:
Blocks: 30673 45444 45776
  Show dependency treegraph
 
Reported: 2012-10-15 17:09 UTC by Dmytro Dziuma
Modified: 2013-04-22 16:16 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dmytro Dziuma 2012-10-15 17:09:01 UTC
This problem is reproducible on the Ukrainian Wikipedia.
Some Ukrainian page names do not appear in the correct order when listed in anĵ category. The problem is about letters "Єє", "Іі", "Ґґ" and probably some others. Right now pages whose names start with "Є" and "І" appear before the rest of the alphabet and the whose that start with "Ґґ" - after the rest of the alphabet. The correct order should be the following:
А а
Б б
В в
Г г
Ґ ґ
Д д
Е е
Є є
Ж ж
З з
И и
І і
Ї ї
Й й
К к
Л л
М м
Н н
О о
П п
Р р
С с
Т т
У у
Ф ф
Х х
Ц ц
Ч ч
Ш ш
Щ щ
Ь ь
Ю ю
Я я
Comment 1 Bawolff (Brian Wolff) 2012-10-16 16:13:39 UTC
We currently have uk wiki sorting based on codepoint order.

There is some support in mediawiki for sorting based on language order. Its still a little experimental, and currently only active on Portuguese wikipedia (And I'm also unsure if the uca-default collation would be sufficient for Ukraine or a custom collation would be needed).

[There is some information about this feature at http://www.mediawiki.org/wiki/Manual:$wgCategoryCollation ]
Comment 2 Dmytro Dziuma 2012-10-16 16:53:11 UTC
Could you test whether uca-default works for Ukrainian?
Comment 3 Bawolff (Brian Wolff) 2013-01-08 17:27:34 UTC
Hmm, seems wrong, but in different ways.


Have a look at the page https://pt.wikipedia.org/wiki/Categoria:Alfabeto_cir%C3%ADlico which should have most of those letters and sorted using the uca code
Comment 4 Dmytro Dziuma 2013-01-09 09:27:16 UTC
Actually it works perfectly. Additionally I created pages with all letters of Ukrainian alphabet in my user space and put it into some temporary category: https://pt.wikipedia.org/wiki/Categoria:Temp_category

When would it be possible to change the collation of Ukrainian Wikipedia to uca-default? Should we wait for the next mediawiki deployment cycle or is it possible to do sooner?
Comment 5 Bawolff (Brian Wolff) 2013-01-09 18:28:28 UTC
(In reply to comment #4)
> Actually it works perfectly. Additionally I created pages with all letters of
> Ukrainian alphabet in my user space and put it into some temporary category:
> https://pt.wikipedia.org/wiki/Categoria:Temp_category

My apologies. I didn't realize that there were {{DEFAULTSORT}}'s on the category I was looking at which messed up the order.
----

Looking at some collation charts, it seems like the main (only?) difference from the default UCA collation for ukranian is the treatment of Ґ/ґ. I think (from what I'm reading, you would know better than I though) that Ґ should be considered a "different" letter than Г (To be technical they should have different primary weights) this would mean that in uca-default, Ґ doesn't get its own header in the category (and might sort more like a case difference rather then a letter difference). I've added some examples to your test category with Ґ/ґ. Please make sure it is what you expect/acceptable.

(I'm not sure how serious that is).


> 
> When would it be possible to change the collation of Ukrainian Wikipedia to
> uca-default? Should we wait for the next mediawiki deployment cycle or is it
> possible to do sooner?

Such config changes are generally unrelated to mediawiki deployment cycles so can be done at any time
Comment 6 Dmytro Dziuma 2013-01-09 19:33:55 UTC
Ґ is a completely separate letter which goes after Г and before Д in the Ukrainian alphabet (see http://en.wikipedia.org/wiki/Ukrainian_alphabet#Alphabet)
Currently not only sort order of "Ґґ" is a problem but also "Єє", "Іі" and "Її" are not sorted correctly in Ukrainian Wikipedia.

uca-default suits perfectly as far I can see from Portuguese Wikipedia, so using of this collation for Ukrainian Wikipedia should solve the problem there too.
Comment 7 Dmytro Dziuma 2013-01-09 19:37:54 UTC
Sorry, I haven't got what you said in the first place. Now I see what you meant. Still it is better to have at least correct sort order even without separate category header.
Comment 8 Bawolff (Brian Wolff) 2013-01-09 21:06:48 UTC
(In reply to comment #7)
> Sorry, I haven't got what you said in the first place. Now I see what you
> meant. Still it is better to have at least correct sort order even without
> separate category header.

Note its not just the category that's off. The actual sorting will be wrong for that letter will be off in words with multiple letters. See the examples I added to your test category.
Comment 9 Bawolff (Brian Wolff) 2013-01-10 20:43:11 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Sorry, I haven't got what you said in the first place. Now I see what you
> > meant. Still it is better to have at least correct sort order even without
> > separate category header.
> 
> Note its not just the category that's off. The actual sorting will be wrong
> for
> that letter will be off in words with multiple letters. See the examples I
> added to your test category.

Sorry, earlier I was typing on my phone and couldn't type non-english letters. To be more explicit, if you have the following pages: Г, Ґ, ГА, ҐА, ГЦ, ҐЦ

The expected order is (I believe, correct me if I'm wrong): Г, ГА, ГЦ, Ґ, ҐА, ҐЦ

But uca-default orders them as: Г, Ґ, ГА, ҐА, ГЦ, ҐЦ
Comment 10 Dmytro Dziuma 2013-01-24 09:11:49 UTC
You are right, sorry for my incompetent comments. I personally believe that while uca-default is not really a solution, but still it is a better option than the current collation.

Are there any chances to solve this problem in general (I guess it is not the problem of Ukrainian Wikipedia only) by adding more collations? This problem has been in MediaWiki for many years and I guess it could be time now to solve it finally
Comment 11 Bartosz Dziewoński 2013-02-18 22:53:05 UTC
Change I838484b9 should fix it.
Comment 12 Bartosz Dziewoński 2013-02-26 20:23:53 UTC
Merged - marking as fixed. 

This ability is now available in the software. I created bug 45444 to discuss and implemented deploying it on the Ukrainian Wikipedia.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links