Last modified: 2014-07-07 19:54:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 54689 - Category collation sort order should ignore spaces, hyphens, apostrophes (?)
Category collation sort order should ignore spaces, hyphens, apostrophes (?)
Status: NEW
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
Blocks: 30673
  Show dependency treegraph
Reported: 2013-09-27 11:09 UTC by Bartosz Dziewoński
Modified: 2014-07-07 19:54 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Bartosz Dziewoński 2013-09-27 11:09:47 UTC
Category collation sort order should ignore spaces, hyphens, apostrophes. 

PHP Collator provides a way to disable comparison of all punctuation using Collator::ALTERNATE_HANDLING; however, it would be useful to keep, for example, commas meaningful to correctly sort biographies ("Last, First" > "Las, Tzzzzz").

Suggested atédia:Le_Bistro/26_septembre_2013#Discussion
Comment 1 Philippe Verdy 2014-02-18 16:33:56 UTC
Spaces and hyohens are too much! Actually we need to preserve word separators, with the exception (tunable by language) of apostrophes which should be considered either ignorable (when they are used as elision marks, most often at end of words, which will be fused with the next word), or as significant letters (when they denote a phoneme like a glottal stop; most ofen in leading positions after a word separation).

So in English, French, Italian, the apostrophe is ignorable for collation and plain-text searches ("its OK" or "it's OK" will match the same)

But in Napolitan (for example) there are distinctions between two types of apostrophes: final elision and initial glottal stops, both may occur in a sequence (and some Napolitan articles use the ASCII double quote (") for them only to avoid the two quotes ('') being interpreted as italics in MediaWiki syntax, when Napolitan wikis should have better used distinctive left and right apostrophes to distinguish them. You can detect these unexpected double quotes because they are surrounded by letters without any space on either sides.

But MediaWiki cannot currently work with ('') between two letters (without any space on either side) as meaning two apostrophes (right apostrophe for final elision, then left apostrophe for the initial glottal stop): it currently always interprets them as the Wiki syntax for italics (single words that switch between roman and italics in the middle are extremely rare, and if needed, you could still insert a <nowiki/> before or after the Wikicode markup ('') to restore its function as an italic style delimiter.

Partial work-around: articles can also use ('<nowiki/>') to separate the apostrophes, but this does not work in contexts where markup is undesired (such as page title names, or title attributes of elements), and users also cannot use the ASCII double-quote kludge in these contexts, because both single and double quotes can occur anywhere in plain-text. So they should use the left and right apostrophes inserted of the ASCII apostrophe-quote and double-quote.
Comment 2 Phillip Patriakeas 2014-07-07 19:54:23 UTC
This may not be a concern for this particular bug report, but sortkeys beginning with or consisting only of a space or other punctuation mark should be handled separately - on the English Wikipedia, at least, a sortkey beginning with a space is frequently used to sort "key" articles (especially the category's eponymous article) to the top of the category; I've also seen asterisks used for this purpose, and it wouldn't surprise me if other language wikis have similar conventions.

Note You need to log in before you can comment on or make changes to this bug.