Last modified: 2013-11-26 23:02:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53472, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51472 - VisualEditor: Backspace deletes combined character clusters together with diacritics
VisualEditor: Backspace deletes combined character clusters together with dia...
Status: RESOLVED FIXED
Product: VisualEditor
Classification: Unclassified
Language (Other open bugs)
unspecified
All All
: High major
: VE-deploy-2013-12-05
Assigned To: D Chan
: i18n
Depends on:
Blocks: ve-multi-lingual 53754
  Show dependency treegraph
 
Reported: 2013-07-16 20:37 UTC by Amir E. Aharoni
Modified: 2013-11-26 23:02 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Amir E. Aharoni 2013-07-16 20:37:13 UTC
Some scripts, among them Arabic, Hebrew, and most scripts of India and SE Asia, are written as combinations of consonants and vowel marks that combine with them.

In most text editors and word processors, when the cursor is after a combination of a consonant and a vowel, and the backspace key s pressed, the vowel is deleted first and the the consonant.

For example if you have the Devanagari combination गा (ग [g] + ा [a]), these are two Unicode characters, which the font joins automatically. If the cursor is after them and you press the backspace key, then the second character ( ा) is supposed to be deleted, and only then the first (ग). That is what happens in most text editors, including MediaWiki's source editor.

In the VisualEditor, backspace immediately deletes the whole cluster. This behavior is unexpected for most users.

To complicate things, when the cursor is before the combined character and the Delete key is pressed, the expected behavior is to delete the whole cluster. This is what happens in the VisualEditor now, and this must be kept like that. For cursor movement, back and forth, the cluster must also be treated as one character, so if the cursor is before गा and the right-pointing arrow is pressed, the cursor is supposed to immediately go after the गा. This also works correctly now, and must be kept.
Comment 1 Moriel Schottlender 2013-07-18 05:26:03 UTC
Could this be a dupe of -
https://bugzilla.wikimedia.org/show_bug.cgi?id=49233  ?
Comment 2 Denis Jacquerye 2013-07-18 07:17:53 UTC
The expected behaviour of backspace can be different with different writing systems.

In Indic scripts, as explained in the bug description, the most common behaviour is that backspace should erase one characters and delete should erase a cluster. See http://publib.boulder.ibm.com/infocenter/hodhelp/v10r0/topic/com.ibm.hod.doc/help/hindi.html#hindispecialkeys
http://www-archive.mozilla.org/projects/ctl/tests/#indiceditoper

For other scripts it might be different, particularly for Latin, Greek and Cyrillic where, because of the precomposed accented characters, it is expected that characters and character sequences (base character + combining diacritic) that represent units will behave the same way, i.e. backspace and delete erase the base and diacritic, for example the single character à and the two characters ɛ̀ should be treated the same way.

http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries talks about this.
Comment 3 Amir E. Aharoni 2013-07-18 07:35:14 UTC
(In reply to comment #1)
> Could this be a dupe of -
> https://bugzilla.wikimedia.org/show_bug.cgi?id=49233  ?

Well, that one is marked as FIXED, and this one is definitely not fixed on master.
Comment 4 Ed Sanders 2013-07-18 15:33:10 UTC
This is how we expect backspace to work with our current support of grapheme clusters. As Denis points out, being able to delete combining marks separately would have to be enabled on a per script basis, as we wouldn't want to require multiple keystrokes to remove e-acute, or a Jamo-constructed Hangul character.
Comment 5 James Forrester 2013-10-08 21:54:53 UTC
There's code in progress to fix this in Gerrit change #80689 which is currently a work-in-progress.
Comment 6 Gerrit Notification Bot 2013-11-20 11:29:18 UTC
Change 80689 had a related patch set uploaded by Divec:
DONTMERGE:Revert model to use simple UTF-16 code units

https://gerrit.wikimedia.org/r/80689
Comment 7 Gerrit Notification Bot 2013-11-26 22:53:11 UTC
Change 80689 merged by jenkins-bot:
Revert model to use simple UTF-16 code units

https://gerrit.wikimedia.org/r/80689

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links