Last modified: 2014-11-18 18:07:32 UTC
In some languages, certain double characters (digraphs) are treated as a sinlge letter. For example in Hungarian, the word "cselló" starts with the double letter "cs". (For more examples, see [[Latin_alphabet#Collating_sequence_with_extensions]].) This means that on huwiki category pages like [[hu:Kategória:Vonós hangszerek]], "cselló" should not be grouped together with words starting with the letter "c", but have an own "cs" section. This doesn't apply to foreign words (eg. "CSS" should be put in the "c" section), and therefore cannot be decided automatically. An easy way to handle it would be to use a special character in the category sort keys: eg. [[Category:XXX|cselló]] would have the same effect as now, but [[Category:XXX|cs,elló]] would create a section for words starting with "cs" in the category page, and put it there. Another use for this would be a more flexible categorization of numbers; see [[Category:Stargate_SG-1_episodes]], where "A" is used for the 10th season. Using the above markup, a "10" section could be created by using [[Category:Stargate_SG-1_episodes|10,{{PAGENAME}}]].
*** This bug has been marked as a duplicate of 164 ***
Reopening, this has nothing to do with collation, and - as explained above - requires additional information beyond the category name to be handled correctly. It cannot be handled without introducing new markup.
It has to do with nothing but collation. It requires no additional information beyond a user-provided sort key, which would then be evaluated in a locale-specific manner. No new markup need be added. The kind of collation support added in bug 164 would allow things like "cs," being interpreted as its own letter, or some better convention. Many languages have similar conventions, many of which you kindly linked to at [[Latin_alphabet#Collating_sequence_with_extensions]], and that's what bug 164 is about. (For the time being, I may as well note that if you replace all "cs" with "c{s" in sort keys, similar to what you suggest as the new markup required, it will sort in the "c" section but after all pages starting with a normal "c", which is at least half correct.)
Sorry. I must have misunderstood bug 164 then. *** This bug has been marked as a duplicate of 164 ***