Last modified: 2014-03-07 11:44:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37990, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35990 - Schwa syncope rule in devanagari transliteration
Schwa syncope rule in devanagari transliteration
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
UniversalLanguageSelector (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch-reviewed
Depends on:
Blocks: 41348 53014
  Show dependency treegraph
 
Reported: 2012-04-15 12:36 UTC by Siddhartha Ghai
Modified: 2014-03-07 11:44 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Siddhartha Ghai 2012-04-15 12:36:43 UTC
Currently the Narayam hindi transliteration scheme requires inputting a after a consonant to remove the viram. This poses a problem with word-ends in hindi which usually have schwa sncope but are written without the viram.

The required system is that inputting a space after any consonant should remove the viram, but if ~ (which is the default viram key) is pressed before pressing space, the viram should not be removed. This was attempted in https://gerrit.wikimedia.org/r/#change,3514 patchsets 2 and 3 but the rules did not behave as expected. The rule to remove the viram on space worked correctly but forcing the viram to stay by pressing the viram key did not work.

Current behaviour: raama gives राम and raam[space] gives राम्[space]
raameshwaram gives रामेश्वरम् and raameshwaram[space] gives रामेश्वरम्[space]

Wanted behaviour: raama gives राम and raam[space] gives and राम[space] (viram is removed)
raameshwaram[space] gives रामेश्वरम[space] and raameshwaram~[space] gives रामेश्वरम्[space] (former removes the viram while latter retains it)

The consensus for schwa syncope rule was obtained at http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2/Narayam

I should add that this is probably the most important rule that is missing right now. Having this ability is very important from a user-perspective.

PS:Filing this bug since I've been unable o fix this myself (I still don't know why the rule in patchset 3 of the submitted change didn't work.)
Comment 1 Siddhartha Ghai 2012-04-15 12:46:43 UTC
Btw, schwa syncope was implemented earlier in http://hi.wiktionary.org/w/index.php?title=MediaWiki:Hi_rules.js&action=raw&ctype=text/javascript via inputting P after the consonant but I find that unintuitive.
Comment 2 Amir E. Aharoni 2012-07-05 05:15:24 UTC
Thank you for this request. I thought about this problem myself in the past - people have to type 'a' at the end of the word even though it's not pronounced.

The first thing that pops into my mind is that it shouldn't be assumed that a words ends in a space. It can also be a punctuation mark, and many other characters. We'll have to test it.
Comment 3 Amir E. Aharoni 2012-07-18 08:30:51 UTC
Shantanoo, can you tell whether this is needed for Marathi, too? Thank you.
Comment 4 Shantanoo 2012-07-22 10:32:47 UTC
(In reply to comment #3)
> Shantanoo, can you tell whether this is needed for Marathi, too? Thank you.

Yes. It will be nice to have feature.

Going forward, I am trying to find out whether it '~' should be used for ' ्' always. Something similar to inscript layout (one needs to type 'd' for ' ्').

I tried to find out some statistics on marathi hunspell dictionary. 

First I splitted the words in composite cluster.
e.g. विकी = वि + की (and not as व + ि + क + ी)

Found total 905027 composite clusters. Out of them, 126731 had '्' in them.
That's around 14% (126731/905027 = .14)
Which is quite low. 86% of time we don't use ' ्'. And still we end up typing extra 'a' to remove ' ्'.

All, what should be implemented? Should we still add viram by default and have consonant (only for 'a') to remove it? Or should we have '~' to be typed when viram is required?
Comment 5 Amir E. Aharoni 2012-07-22 11:04:58 UTC
(In reply to comment #4)
> All, what should be implemented? Should we still add viram by default and have
> consonant (only for 'a') to remove it? Or should we have '~' to be typed when
> viram is required?

To the best of my understanding, the idea of a transliteration mapping is that only the actual sounds are typed and the viram should be used as little as possible.
Comment 6 Siddhartha Ghai 2012-07-23 03:16:55 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > All, what should be implemented? Should we still add viram by default and have
> > consonant (only for 'a') to remove it? Or should we have '~' to be typed when
> > viram is required?
> 
> To the best of my understanding, the idea of a transliteration mapping is that
> only the actual sounds are typed and the viram should be used as little as
> possible.

Yes, ideally one should only have to write what one speaks. However, mapping schwa syncope for all cases will be rather a bit of a headache. I'd originally started this bug for schwa syncope at word-endings since that is the most problematic. Correcting schwa syncope within words themselves is a problem at an entirely different level of difficulty. As can be seen on the wikipedia article [1] (see section "Common transcription and diction errors"), the problem of syncope within words is much greater than at word-ends. I like the current system for handling  ् as far as words are concerned. This is because if we complete the consonants by default (i.e  ् isn't added by default), then writing a lot of words becomes a problem, since rakshhA would become राक्षा (i.e r+a will become equivalent to the current r+A). Similarly, wherever the schwa is pronounced, typing an a in between (as is natural) would produce an unintuitive ा in between. So although the current handling of schwa syncope within words is imperfect, it is better than the other option.

However, I do believe we need to find a fix to schwa syncope at word ends. Words may end with a space, a tab, a newline, dot, comma, semicolon, colon, single-quote, double-quote, dash, equal sign, plus sign, any kind of braces, a slash, a vertical pipe, a greater than or less than sign, or any of the other symbols and numerals availaible on the keyboard. We basically need one rule to handle a word being terminated in all these cases to default to removing the  ् . However, the  ् shouldn't be removed if it has been explicitly added (by pressing ~) before the pressing of any of these keys. The problem lies in being able to separate the implicitly added  ् and the explicitly added  ् once the next key is pressed. I'd tried to resolve this in https://gerrit.wikimedia.org/r/#change,3514 patchset 3 by increasing the keybuffer to 2 to detect the ~ keystroke. However, I was unsuccessful for some reason :( and had to undo (see diff [2]) I don't know why that rule didn't work, but if that can be made to work with some modification/correction, the only further modification needed would be adding the various possible word-endings to the rule.

[1] http://en.wikipedia.org/wiki/Schwa_deletion_in_Indo-Aryan_languages
[2] https://gerrit.wikimedia.org/r/#/c/3514/3..4/resources/ext.narayam.rules.hi.js
Comment 7 Santhosh Thottingal 2012-07-23 06:25:08 UTC
For the record, Amir tries to address this issue in https://gerrit.wikimedia.org/r/#/c/15974/
Comment 8 Siddhartha Ghai 2012-07-23 06:30:38 UTC
Submitted https://gerrit.wikimedia.org/r/#/c/16272/ for schwa syncope with
space endings. Did try in the console before committing and hope it works.

However, modifying the rules to support various keys for word-endings would
require either enabling callbacks (Bug 35457), or two rules each time the
word-ending key is itself to be transformed (like . to ।). This is because
doing a $1$2 thing simply adds the . as it is without applying the appropriate
rule to the dot itself.

So, I think this forms a very good usecase for fixing Bug 35457. (Unless
someone can come up with a better way of doing this) :)

PS:Was supposed to write this before Santhosh's comment about Amir's commit.
Comment 9 Siddhartha Ghai 2012-07-23 06:36:47 UTC
(In reply to comment #7)
> For the record, Amir tries to address this issue in
> https://gerrit.wikimedia.org/r/#/c/15974/

Just tried this patch in the console. Makes ending a word with a halant impossible (same problem as my original patch). However, the patch I submitted seems to be working with space. It could be extended with multiple rules the same way as Amir's patch, but solving Bug 35457 will probably be much more efficient.
Comment 10 db [inactive,noenotif] 2012-11-28 13:55:02 UTC
(In reply to comment #8)
> Submitted https://gerrit.wikimedia.org/r/#/c/16272/ for schwa syncope with

Status Merged
Comment 11 Siddhartha Ghai 2013-01-01 22:23:03 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Shantanoo, can you tell whether this is needed for Marathi, too? Thank you.
> 
> Yes. It will be nice to have feature.

Looking at [1] it seems that this has not been implemented for marathi yet. Shantanoo, is the same solution as hi needed for mr? Or do we a different one?

[1]: https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/Narayam.git;a=blob;f=resources/ext.narayam.rules.mr.js;h=74b70bba4202f3221db5ce973e7c2ab2e5990111;hb=HEAD
Comment 12 Santhosh Thottingal 2013-03-12 09:05:40 UTC
https://github.com/wikimedia/jquery.ime/issues/149
Comment 13 Amir E. Aharoni 2013-06-13 21:21:12 UTC
Moving to ULS.
Comment 14 Andre Klapper 2013-09-26 14:29:53 UTC
[Assignee was removed, hence also resetting ASSIGNED status]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links