Last modified: 2013-07-23 18:11:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 46058 - sorting order in categories for sv.wikisource
sorting order in categories for sv.wikisource
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
wmf-deployment
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on: 46036
Blocks: 30673 collations
  Show dependency treegraph
 
Reported: 2013-03-13 06:47 UTC by Ronnie
Modified: 2013-07-23 18:11 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ronnie 2013-03-13 06:47:27 UTC
It looks like you have solved bug 45446 for sv.wikipedia.

On sv.Wikisource, we would like to have the same feature, but with a small change, if possible. We are mainly dealing with older texts, therfore an older sorting order is more valid. The difference from Wikipedia should be that the letter 'W' should be regarded as another way of writing the letter 'V', they should have the same priority in sorting order. 'Wallenberg' should be listed before 'Vennerström' under label 'V'.

Community talk: "sv.wikisource:Wikisource:Mötesplatsen#Sorteringsordning i kategorier"
Two users has agreed this far, all of the active sysops. 

Look in the article of "W" on sv.wikipedia for details of why we want this solution. The letter W officially became a part of the Swedish alphabet as late as 2006. On Wikisource, we are dealing mainly with texts from 19'th century.
Comment 1 Bartosz Dziewoński 2013-03-13 22:49:19 UTC
I'll have to look into it; not sure if it's possible, and if it is, I'll have to figure out how to configure it.
Comment 2 Bartosz Dziewoński 2013-03-13 22:51:26 UTC
Also, there is a patch underway to make setting different collations on a per-category basis possible, see bug 44667 (it's a part of that branch). You might want to wait until it's available, to be able to set V=W for Swedish-language categories, and "normal" sorting for the rest.
Comment 3 Bawolff (Brian Wolff) 2013-03-13 23:13:28 UTC
Icu seems to support making custom collations at run time (based on docs). However php's intl extension doesnt seem to expose this. So we are probably left with hacking it over top (aka turn w to v before feeding it to icu). Or somehow getting upstream to make a sv historical collation (they already have 2 sv collations - reformed and normal) however that would probably be a difficult process I imagine. I suppose third option is getting php upstream to expose custom collation maling methods

To clarify, when you say sort the same way do you mean totally identical, or just primary identical? (The sort algorithm has 3 levels. We check the primary level first ( ie different letters: A vs B) if there is a tie on that level for all letters then we move on to check accents (roughly). If there is a tie again we move on to checking case distinction. In your case it sounds like you would want V and W to be the same on the primary level but different on the secondary level - is that the case? (Which might be a moot point since the hack over top solution would only allow making them identical.)
Comment 4 Bawolff (Brian Wolff) 2013-03-13 23:15:18 UTC
-shell. Needs new code written before shell can do anything.
Comment 5 Ronnie 2013-03-14 06:55:09 UTC
(In reply to comment #3)
> Icu seems to support making custom collations at run time (based on docs).
> However php's intl extension doesnt seem to expose this. So we are probably
> left with hacking it over top (aka turn w to v before feeding it to icu). Or
> somehow getting upstream to make a sv historical collation (they already
> have 2
> sv collations - reformed and normal) however that would probably be a
> difficult
> process I imagine. I suppose third option is getting php upstream to expose
> custom collation maling methods
> 
> To clarify, when you say sort the same way do you mean totally identical, or
> just primary identical? (The sort algorithm has 3 levels. We check the
> primary
> level first ( ie different letters: A vs B) if there is a tie on that level
> for
> all letters then we move on to check accents (roughly). If there is a tie
> again
> we move on to checking case distinction. In your case it sounds like you
> would
> want V and W to be the same on the primary level but different on the
> secondary
> level - is that the case? (Which might be a moot point since the hack over
> top
> solution would only allow making them identical.)

If there are two pages with the defaultsort "Vallenberg" and "Wallenberg", I think "V" should be sorted before "W" as if "W" was a diacritic of "V", but it is not critical. Earlier "W" was just regarded as another way of writing "V". The letter "W" was almost only used in names and foreign words. Then somebody invented 'World Wide Web', and you know the rest of the story better than me...
Comment 6 Ronnie 2013-03-14 07:26:34 UTC
(In reply to comment #2)
> Also, there is a patch underway to make setting different collations on a
> per-category basis possible, see bug 44667 (it's a part of that branch). You
> might want to wait until it's available, to be able to set V=W for
> Swedish-language categories, and "normal" sorting for the rest.

Can be a good idea, but is is not essential. I think for example V=W content-namespaces, but normal Swedish settings for other namespaces, like User: and Project:.
Comment 7 Ronnie 2013-03-14 12:29:21 UTC
Can You meanwhile fix the ABC...ZÄÅÖ-problem (to ÅÄÖ) like you did for sv.wikipedia?
Comment 8 Bartosz Dziewoński 2013-03-14 16:40:07 UTC
(In reply to comment #7)
> Can You meanwhile fix the ABC...ZÄÅÖ-problem (to ÅÄÖ) like you did for
> sv.wikipedia?

If you mean in exactly the same way as for sv.wikipedia (including the accented letters behavior change), then yes, but this is currently blocked by bug 46036.
Comment 9 Bawolff (Brian Wolff) 2013-03-23 22:45:00 UTC
(In reply to comment #3)
> Icu seems to support making custom collations at run time (based on docs).
> However php's intl extension doesnt seem to expose this. So we are probably
> left with hacking it over top (aka turn w to v before feeding it to icu). Or
> somehow getting upstream to make a sv historical collation (they already
> have 2
> sv collations - reformed and normal) however that would probably be a
> difficult
> process I imagine. I suppose third option is getting php upstream to expose
> custom collation maling methods
> 
> To clarify, when you say sort the same way do you mean totally identical, or
> just primary identical? (The sort algorithm has 3 levels. We check the
> primary
> level first ( ie different letters: A vs B) if there is a tie on that level
> for
> all letters then we move on to check accents (roughly). If there is a tie
> again
> we move on to checking case distinction. In your case it sounds like you
> would
> want V and W to be the same on the primary level but different on the
> secondary
> level - is that the case? (Which might be a moot point since the hack over
> top
> solution would only allow making them identical.)


I'm sorry, I made a mistake looking at the available collations. intl supports a "standard" collation (vs "reformed" which is what sv.wikipedia is using). The standard collation has rules:

                "&D<<đ<<<Đ<<ð<<<Ð"
                "&t<<<þ/h"
                "&T<<<Þ/H"
                "&v<<<V<<w<<<W"
                "&Y<<ü<<<Ü<<ű<<<Ű"
                "&[before 1]ǀ<å<<<Å<ä<<<Ä<<æ<<<Æ<<ę<<<Ę<ö<<<Ö<<ø<<<Ø<<ő<<<Ő<<œ<<<Œ<"
                "<ô<<<Ô"

Which means that V would be treated as Secondary different from W, which is what you want. (The collation can be triggered with a locale name sv@collation=standard . In theory, I thought sv-u-co-standard should also trigger it, but it doesn't seem to...)
Comment 10 Bawolff (Brian Wolff) 2013-03-23 23:49:58 UTC
for record, the change in MW (would still need a wmf config change to enable): Gerrit change #55498
Comment 11 Bartosz Dziewoński 2013-06-27 18:46:39 UTC
Gerrit change #55498 merge, this should now be possible to do.
Comment 12 Gerrit Notification Bot 2013-07-23 16:15:45 UTC
Change 75351 had a related patch set uploaded by Reedy:
Category sorting order for sv.wikisource

https://gerrit.wikimedia.org/r/75351
Comment 13 Gerrit Notification Bot 2013-07-23 18:07:42 UTC
Change 75351 merged by jenkins-bot:
Category sorting order for sv.wikisource

https://gerrit.wikimedia.org/r/75351
Comment 14 Sam Reed (reedy) 2013-07-23 18:11:31 UTC
reedy@tin:/a/common/php-1.22wmf11$ mwscript maintenance/updateCollation.php --wiki=svwikisource --previous-collation=uppercase
Fixing collation for 82831 rows.
Selecting next 10000 rows... processing...10000 done.
Selecting next 10000 rows... processing...20000 done.
Selecting next 10000 rows... processing...30000 done.
Selecting next 10000 rows... processing...40000 done.
Selecting next 10000 rows... processing...50000 done.
Selecting next 10000 rows... processing...60000 done.
Selecting next 10000 rows... processing...70000 done.
Selecting next 10000 rows... processing...80000 done.
Selecting next 10000 rows... processing...82831 done.
82831 rows processed

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links