Last modified: 2014-09-17 16:55:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T56168, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 54168 - Category collation for Estonian projects
Category collation for Estonian projects
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://unicode.org/cldr/trac/ticket/6701
: shell, upstream
Depends on:
Blocks: collations
  Show dependency treegraph
 
Reported: 2013-09-16 15:34 UTC by Pikne
Modified: 2014-09-17 16:55 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Sorting order using the patch from comment 6 (14.62 KB, image/png)
2014-07-20 19:15 UTC, Bartosz Dziewoński
Details

Description Pikne 2013-09-16 15:34:19 UTC

    
Comment 1 Pikne 2013-09-16 16:00:16 UTC
Pages in etwiki, etwikisource, etwikiquote and etwikibooks categories should be order as the letters are ordered in Estonian alphabet. (Not sure about etwiktionary, where page names are in all languages, this probably needs further discussion.)

I assume this is done by setting $wgCategoryCollation to uca-et. *But* I don't know what exactly is behind this setting in current version or where can I check this. Some UCA related web pages suggest that UCA for Estonian sorts words beginning with letter 'W' under 'V'. This is wrong. If this is the case for our uca-et setting too, then this should be changed in a way that 'W' is sorted as a separate letter after 'V' and before 'Õ'.
Comment 2 Bawolff (Brian Wolff) 2013-09-16 18:25:06 UTC
This is the chart for uca-et: http://collation-charts.org/icu442/icu442-et.html (things on the same line are roughly considered the same letter. This chart is for a different version of uca then we use. Ill check later when I have my laptop if things are different for later versions)

Assuming that this chart is still right for later versions of uca, v and w are considered to have a secondary difference. Which basically means they are considered the same unless there is a tie, and if there is a tie, v comes first.
Comment 3 Pikne 2013-09-20 11:37:43 UTC
To be clear, it should be sorted as "V: Vatikan, Volga | W: Wales, Windsor" and not "V: Wales, Vatikan, Windsor, Volga". E.g. see the place name section of the dictionary of standard Estonian: http://www.eki.ee/dict/qs/kohanimed.html

If v and w are treated as in the chart referenced above, then I assume you can modify this here as the chart was modified for Finnish in bug 46330?
Comment 4 Bawolff (Brian Wolff) 2013-09-21 20:02:50 UTC
(In reply to comment #2)
> This is the chart for uca-et:
> http://collation-charts.org/icu442/icu442-et.html
> (things on the same line are roughly considered the same letter. This chart
> is
> for a different version of uca then we use. Ill check later when I have my
> laptop if things are different for later versions)

It appears this chart is still accurate (For reference for myself, since I can never find it, most recent version of icu library rules is at https://ssl.icu-project.org/repos/icu/icu/trunk/source/data/coll/et.txt )

--------

>If v and w are treated as in the chart referenced above, then I assume you can
>modify this here as the chart was modified for Finnish in bug 46330?

Finish had an issue with the section headings. The chart itself wasn't modified.


We don't have the ability to do custom charts at the moment (The functionality is supported in ICU library, but PHP's intl library doesn't expose it to us).

We could maybe do something hacky like replace "W" with U+1D21 ('LATIN LETTER SMALL CAPITAL W' - which does not get sorted like "V" in uca-et collation) just for the sorting.
Comment 5 Bartosz Dziewoński 2014-07-20 19:11:10 UTC
(In reply to Bawolff (Brian Wolff) from comment #4)
> We could maybe do something hacky like replace "W" with U+1D21 ('LATIN
> LETTER SMALL CAPITAL W' - which does not get sorted like "V" in uca-et
> collation) just for the sorting.

I tried this and it seems to work reliably. I think we could do it and drop the workaround when upstream fixes their data.
Comment 6 Gerrit Notification Bot 2014-07-20 19:12:17 UTC
Change 147980 had a related patch set uploaded by Bartosz Dziewoński:
Collation: Workaround for incorrect collation of Estonian

https://gerrit.wikimedia.org/r/147980
Comment 7 Bartosz Dziewoński 2014-07-20 19:15:57 UTC
Created attachment 15986 [details]
Sorting order using the patch from comment 6
Comment 8 Tim Starling 2014-07-21 04:26:24 UTC
I see there is an upstream report for this: 
<http://unicode.org/cldr/trac/ticket/6701>

It was opened on the same day as this bug, with a very similar description, I assume by the same person.
Comment 9 Gerrit Notification Bot 2014-07-21 05:05:53 UTC
Change 147980 merged by jenkins-bot:
Collation: Workaround for incorrect collation of Estonian

https://gerrit.wikimedia.org/r/147980
Comment 10 Bartosz Dziewoński 2014-07-21 21:38:43 UTC
Hooray :D

The next step is to hold a quick discussion/vote on each wiki that would want this enabled, just to make sure nothing happen behind someone's back. Pikne, can you do that?

I set up a little testing wiki on Labs: http://estonia.wmflabs.org/ (or rather had one set up for me by Yuvi :) ). Please verify that this indeed works correctly. Feel free to create categories and pages and link it in the on-wiki discussions. (The wiki will probably disappear when it is no longer needed.)

I already created two categories:
* http://estonia.wmflabs.org/wiki/Kategooria:Test just enumerates the letters
  of the alphabet
* http://estonia.wmflabs.org/wiki/Kategooria:Eesti_maletajad is an import of
  [[et:Kategooria:Eesti maletajad]] (to see a real-world example)
Comment 11 Pikne 2014-07-22 15:08:00 UTC
(In reply to Bartosz Dziewoński from comment #10)
> The next step is to hold a quick discussion/vote on each wiki that would
> want this enabled, just to make sure nothing happen behind someone's back.

I asked for this and specifically about v and w difference on Estonian Wikipedia by the time I opened this bug: [[et:Vikipeedia:Üldine arutelu/Arhiiv 27#Tähestikuline järjestus kategoorias]]. There are no objections. As for Wikisource, Wikibooks and Wikiquote, a few people active there are also active on Wikipedia (and Wikipedia is where I would look for these a few contributors), so I would say that we more less have their consent as well. As for Wiktionary, I now asked them if they perhaps wanted uca-default instead or if it's worthwhile to change anything there now. I think we can consider it a separate bug if there will be a change on Estonian Wiktionary.

> I set up a little testing wiki on Labs.

Test categories look fine.

(In reply to Tim Starling from comment #8)
> It was opened on the same day as this bug, with a very similar description,
> I assume by the same person.

Yes, I opened it in hope that this brings as nearer the solution.
Comment 12 Pikne 2014-08-08 09:20:49 UTC
(In reply to comment #11)
> I think we can consider it a separate bug if there will be a change
> on Estonian Wiktionary.

Then again, by now it seems that uca-et is fine enough for Wiktionary as well: [[:et:wikt:Vikisõnastik:Üldine arutelu#Järjestus kategooriates]].

I think we can move on with setting uca-et for all Estonian projects and recomputing the sort keys.
Comment 13 Gerrit Notification Bot 2014-08-14 23:08:05 UTC
Change 154213 had a related patch set uploaded by Bartosz Dziewoński:
Set $wgCategoryCollation to 'xx-uca-et' on all Estonian-language wikis

https://gerrit.wikimedia.org/r/154213
Comment 14 Bartosz Dziewoński 2014-08-14 23:10:08 UTC
I have uploaded the configuration patch, but collation config changes seem to be on hold for a while now and I don't know why. No idea how long this is going to take :/
Comment 15 Gerrit Notification Bot 2014-09-17 00:44:56 UTC
Change 154213 merged by jenkins-bot:
Set $wgCategoryCollation to 'xx-uca-et' on all Estonian-language wikis

https://gerrit.wikimedia.org/r/154213
Comment 16 Sam Reed (reedy) 2014-09-17 01:04:15 UTC
This is done now... Any further improvements needed, or can we close the bug?
Comment 17 Pikne 2014-09-17 10:55:51 UTC
Great. Thanks for doing the hacky part and the rest.

Though, as for the upstream part of this bug, today is about the day when CLDR v26 is expected to be released and the w and v difference should be fixed there too :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links