Last modified: 2014-10-08 14:46:49 UTC
It has a comment that says TODO: strip out calls to splitClusters then delete this method.
Well, if we are going beyond BMP, we need to take surrogate pairs into account, too. JavaScript does not do that natively, AFAIK. Would code points be enough in that case?
(In reply to Sucheta Ghoshal from comment #1) > Well, if we are going beyond BMP, we need to take surrogate pairs into > account, too. JavaScript does not do that natively, AFAIK. Would code points > be enough in that case? I think the idea is that at the DM layer, we won't be combining anything beyond code points, but David (CC) can confirm.
Hi, yes, we wrote ve.splitClusters because we wanted to make the document model be a list of grapheme clusters, instead of a list of raw Javascript characters (i.e. Unicode code units, so each surrogate pair like '\uD860\uDEE2' is treated as two separate entities '\uD860' and '\uDEE2'). However, we've subsequently decided against that, because browsers will not always agree what constitutes a grapheme cluster. The example of Malayalam, where the font can affect the number of clusters, is one example of how problematic it could be to try to match the browser's clusterings exactly. Therefore, the DM is to remain a list of raw Javascript characters, and support related to clustering is being developed at a level on top of the DM.
Change 165433 had a related patch set uploaded by SuchetaG: Getting rid of ve.splitClusters in VE core https://gerrit.wikimedia.org/r/165433
Change 165430 had a related patch set uploaded by SuchetaG: Getting rid of ve.splitClusters in ve-mw https://gerrit.wikimedia.org/r/165430
Change 165430 merged by jenkins-bot: Getting rid of ve.splitClusters in ve-mw https://gerrit.wikimedia.org/r/165430
Change 165433 merged by jenkins-bot: Getting rid of ve.splitClusters in VE core https://gerrit.wikimedia.org/r/165433