Last modified: 2014-03-10 13:20:03 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 10099 - missing unicode normalization
missing unicode normalization
Product: MediaWiki extensions
Classification: Unclassified
WikiLexicalData/OmegaWiki (Other open bugs)
All All
: Low enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2007-06-01 20:26 UTC by Denis Jacquerye
Modified: 2014-03-10 13:20 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Denis Jacquerye 2007-06-01 20:26:06 UTC
The "completion" feature and probably some other places of input are missing unicode normalization.

For example, while using the interface in French, when adding a Definition:
- select the blank Languague field
- start type "franc" and "français" is offered as a valid language
- add a combining mark to have "franç" and no language is offered anymore

"ç" and "ç" are equivalent, the expected result is the same for either.
Comment 1 Kipmaster 2009-11-29 21:08:27 UTC
Now, "c" and "ç" both allow français to be displayed in the combobox. So, it has been solved apparently.
Comment 2 Denis Jacquerye 2009-11-29 22:53:02 UTC
No this hasn't been fixed.

Try these three:
(ASCII) franc <U+0066 U+0072 U+0061 U+006E U+0063>
(NFC)   franç <U+0066 U+0072 U+0061 U+006E U+00E7>
(NFD)   franç <U+0066 U+0072 U+0061 U+006E U+0063 U+0327>

ASCII and NFC will give the expected result: français, français canadien, français de Belgique, français de France, français de Suisse, francoprovençal

NFD does not give any language in the list.

NFC and NFD should return the same result. For Unicode NFC and NFD represent the same string.
What is recent is ASCII and NFC giving the same result.
Comment 3 Denis Jacquerye 2009-11-29 23:00:42 UTC
Bugzilla normalizes to NFC.
So in my example NFD and NFC are both saved and displayed as NFC.
Use the codepoints if you want to actually use NFD.
In HTML code: franc&#x0327;
Comment 4 2012-08-01 04:44:29 UTC
For those who don't know, these acronyms are explained there:
The problem also appears for the ellipsis / 3 points.
Comment 5 Kipmaster 2012-08-09 12:59:27 UTC
Could you check again with the ç now? It seems to almost work, except that the function that puts the searched string in bold does not work with NFD (if I understood that correctly this time)
Comment 6 Denis Jacquerye 2012-08-09 16:05:25 UTC
Cool. It seems to have been fixed.
Now "francais", "fran&ccedil;ais" and "franc&#x0327;ais" give the same results.
Merci Kip !

But yes, when using combining characters (NFD) it is not bolded.

Note You need to log in before you can comment on or make changes to this bug.