Last modified: 2011-01-25 01:09:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 9413 - Normalization of Arabic presentation forms
Normalization of Arabic presentation forms
Product: MediaWiki
Classification: Unclassified
Page editing (Other open bugs)
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2007-03-25 16:43 UTC by Tim Starling
Modified: 2011-01-25 01:09 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Tim Starling 2007-03-25 16:43:31 UTC
According to the Unicode FAQ:

  Q. Is it necessary to use the presentation forms that are defined in Unicode?

  A. No, it is not necessary to use those presentation forms. Those forms were
selected and identified in the early days of developing Unicode when
sophisticated rendering engines were not prevalent. A selected subset of the
presentation forms was included to provide users with a simple method to
generate them.

  Q. Can one use the presentation forms in a data file?

  A. It is strongly discouraged and not recommended because it does not
guarantee data integrity and interoperability. In the particular case of Arabic,
data files should include only the characters in the Arabic block, U+0600 to U+06FF.

Unidentified broken clients are inserting Arabic presentation forms into
articles on This causes problems because some browsers do not
display these characters. I suggest we convert presentation forms to their
canonical equivalent during NFC normalisation on page save. For those rare cases
where isolated characters in specified forms are required, HTML character
entities can be used.
Comment 1 Brion Vibber 2007-03-26 13:53:41 UTC
We already do NFC normalization on page save. Are you asking for additional
If so, can you specify?
Comment 2 Tim Starling 2007-03-26 20:37:42 UTC
Yes additional conversions. The Arabic presentation forms (FB50-FDFF and
FE80-FEFF) should be converted to their equivalents in the Arabic block,
0600-06FF. The relevant mapping is given in the Decomposition_Mapping field of
UnicodeData.txt. For example:


Because there is a formatting tag "<final>", this is a compatibility mapping
(part of NFKC), rather than a canonical mapping (part of NFC).
Comment 3 Tim Starling 2010-01-04 08:30:12 UTC
Fixed in r60599.

Note You need to log in before you can comment on or make changes to this bug.