Last modified: 2014-03-07 11:44:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40238, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38238 - Typing ओं is not possible in hindi transliteration
Typing ओं is not possible in hindi transliteration
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
UniversalLanguageSelector (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 41348 53014
  Show dependency treegraph
 
Reported: 2012-07-07 21:03 UTC by Siddhartha Ghai
Modified: 2014-03-07 11:44 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Siddhartha Ghai 2012-07-07 21:03:28 UTC
Typing ओं is currently not possible in hindi transliteration due to a rule-conflict.

Typing ओं requires input of oM. However, due to the rule

['ओM', '', 'ॐ']

This combination is reserved for ॐ, making writing ओं impossible.

This is an unfortunate side-effect, but needs to be resolved somehow, and quickly.

I say quickly because several words in hindi, when used in plural form in a sentence, end in ओं . Examples: भाषाओं, चिताओं, कलाओं, घटाओं etc.

I don't know why this bug wasn't noticed earlier, but it needs to be resolved somehow ASAP.
Comment 1 Santhosh Thottingal 2012-07-09 19:23:45 UTC
Do you have any key combination to suggest?
Comment 2 Amir E. Aharoni 2012-07-17 20:24:35 UTC
A few questions:

1. Is there any other transliteration standard in which this is solved already?

2. What is used more frequently: the OM (ॐ) character or the simple syllable ओं?

3. Shantanoo - is this fix needed in Marathi, too?
Comment 3 Siddhartha Ghai 2012-07-17 21:13:27 UTC
ओं is used much more frequently than ॐ 

One possible solution is to change the input for ॐ to auM. This is slightly harder than oM but has been used in other transliteration schemes. It was used in ITRANS as AUM. [1]


[1] http://en.wikipedia.org/wiki/ITRANS
Comment 4 Amir E. Aharoni 2012-07-17 21:50:07 UTC
A patch changing ॐ to auM and making oM type ओं was submitted here:
https://gerrit.wikimedia.org/r/#/c/15846/

It can be tested here:
http://sandbox.translatewiki.net/wiki/Main_Page?uselang=hi

This patch makes a significant change in the current behavior, so before deployment some community consensus should be demonstrated, for example in Village pumps of Hindi projects.

Also, as I said earlier, this change may be needed in Marathi, too. It can be deployed to the Hindi projects before the change for Marathi is made, however.
Comment 5 Shantanoo 2012-07-18 08:11:53 UTC
(In reply to comment #2)
> A few questions:
> 
> 1. Is there any other transliteration standard in which this is solved already?
> 
> 2. What is used more frequently: the OM (ॐ) character or the simple syllable
> ओं?
> 
> 3. Shantanoo - is this fix needed in Marathi, too?

Yes. It should be also fixed for Marathi. e.g. 'Onkar' = 'ओंकार' 
(https://www.google.co.in/search?q=onkar+marathi)

Also, when the name is from another language, one needs 'ओं'. E.g. 'Ontario, Canada'.
Comment 6 Amir E. Aharoni 2012-07-18 09:08:01 UTC
I added support for Marathi, too.
Comment 7 Siddhartha Ghai 2012-07-18 19:33:51 UTC
(In reply to comment #4)
> This patch makes a significant change in the current behavior, so before
> deployment some community consensus should be demonstrated, for example in
> Village pumps of Hindi projects.

Good idea about discussion for consensus. Since this basically makes writing औं impossible instead of ओं, its probably better to ask the community (to check in case something is written with औं too). Have raised this question and asked for consensus at the village pump at hi-wp: http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8
Comment 8 Shantanoo 2012-07-19 07:08:03 UTC
(In reply to comment #7)
> (In reply to comment #4)
> > This patch makes a significant change in the current behavior, so before
> > deployment some community consensus should be demonstrated, for example in
> > Village pumps of Hindi projects.
> 
> Good idea about discussion for consensus. Since this basically makes writing औं
> impossible instead of ओं, its probably better to ask the community (to check in
> case something is written with औं too). Have raised this question and asked for
> consensus at the village pump at hi-wp:
> http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8

'Ounce' unit of weight is written as 'औंस'. Maybe, that support is also required.
Comment 9 Amir E. Aharoni 2012-07-19 09:21:57 UTC
Hmm. If "auM" produces ॐ, then औंस can't be written.

Maybe make it "AUM"?

Does ॐ always appear as a separate word?
Comment 10 Shantanoo 2012-07-19 10:22:29 UTC
(In reply to comment #9)
> Hmm. If "auM" produces ॐ, then औंस can't be written.
> 
> Maybe make it "AUM"?

+1 for AUM.

> 
> Does ॐ always appear as a separate word?

ॐकार, ॐकारेश्वर are valid words.
Comment 11 Amir E. Aharoni 2012-07-19 11:00:01 UTC
1. Just to make sure: ॐकार and ओंकार are both valid?

2. I can easily make auMsa->औंस and AUM->ॐ work, but will this finally cover all the cases? Is anybody familiar with a comprehensive standard on which I would be able to base our mapping? Is ITRANS comprehensive, for example?
Comment 12 Shantanoo 2012-07-19 11:11:18 UTC
(In reply to comment #11)
> 1. Just to make sure: ॐकार and ओंकार are both valid?
>

Yes. I know 2 different people 'Omkar' and 'Onkar' :). Either way, we still have 'Ontario' which is ओंटारीओ(?).
Comment 13 Shantanoo 2012-07-19 17:41:43 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #4)
> > > This patch makes a significant change in the current behavior, so before
> > > deployment some community consensus should be demonstrated, for example in
> > > Village pumps of Hindi projects.
> > 
> > Good idea about discussion for consensus. Since this basically makes writing औं
> > impossible instead of ओं, its probably better to ask the community (to check in
> > case something is written with औं too). Have raised this question and asked for
> > consensus at the village pump at hi-wp:
> > http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8
> 
> 'Ounce' unit of weight is written as 'औंस'. Maybe, that support is also
> required.

I missed one important example for औं:
http://mr.wikipedia.org/wiki/%E0%A4%94%E0%A4%82%E0%A4%A7
Comment 14 Siddhartha Ghai 2012-07-22 19:43:01 UTC
(In reply to comment #11)
> 1. Just to make sure: ॐकार and ओंकार are both valid?
> 
> 2. I can easily make auMsa->औंस and AUM->ॐ work, but will this finally cover
> all the cases? Is anybody familiar with a comprehensive standard on which I
> would be able to base our mapping? Is ITRANS comprehensive, for example?

As per the discussion on hi-wp village pump, the following words use औं:
औंधा
औंगारी सूर्य मंदिर [1]
औंराडीह गाँव, गुरुआ (गया) [2]
औंकोलोजी (oncology)

Clearly, औं is also used in hindi, and the submitted patch[3] won't solve the problem.

Also, AUM won't solve the problem either, since that is for आऊं which in itself is a popular (mis)spelling of the word आऊँ and can be used to write the word आऊंगा (correct spelling आऊँगा).

Although I want to have an easy input method for inputting ॐ, I am wondering how much it is actually used in regular text, and if a direct input for it is needed (or it can be handled in the editing tools shown below the edit-window). I'll be asking the same at hi-wp discussion [4]. Shantanoo, is it used regularly in Marathi or rarely?

[1] http://hi.wikipedia.org/wiki/%E0%A4%94%E0%A4%82%E0%A4%97%E0%A4%BE%E0%A4%B0%E0%A5%80_%E0%A4%B8%E0%A5%82%E0%A4%B0%E0%A5%8D%E0%A4%AF_%E0%A4%AE%E0%A4%82%E0%A4%A6%E0%A4%BF%E0%A4%B0
[2] http://hi.wikipedia.org/wiki/%E0%A4%94%E0%A4%82%E0%A4%B0%E0%A4%BE%E0%A4%A1%E0%A5%80%E0%A4%B9_%E0%A4%97%E0%A4%BE%E0%A4%81%E0%A4%B5,_%E0%A4%97%E0%A5%81%E0%A4%B0%E0%A5%81%E0%A4%86_(%E0%A4%97%E0%A4%AF%E0%A4%BE)
[3] https://gerrit.wikimedia.org/r/#/c/15846/
[4] http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8
Comment 15 Shantanoo 2012-07-23 20:55:28 UTC
(In reply to comment #14)

> Although I want to have an easy input method for inputting ॐ, I am wondering
> how much it is actually used in regular text, and if a direct input for it is
> needed (or it can be handled in the editing tools shown below the edit-window).
> I'll be asking the same at hi-wp discussion [4]. Shantanoo, is it used
> regularly in Marathi or rarely?

It is used rarely.
IMO, putting in editing tools should be fine. But, I am not sure regarding how one can use it for the 'search box'. There is not edit toolbox for entering text in search box.

How about pre and post fixing '_' (or any other combination with pre and/or post fixing) to have rarely used characters (but important)?

e.g. _aum_ or _#aum or __aum or aum## or (aum) or any other combination.

or as suggested on hiwiki,
'_M' will be 'ं' and should not be combined with other sequence. (Instead of '_' some other character may be used. _italics_ )
This can be extended to 'अे' (a_e), 'अै' (a_ai).
Comment 16 Siddhartha Ghai 2012-07-24 17:07:26 UTC
(In reply to comment #15)
> It is used rarely.
> IMO, putting in editing tools should be fine. But, I am not sure regarding how
> one can use it for the 'search box'. There is not edit toolbox for entering
> text in search box.
> 
> How about pre and post fixing '_' (or any other combination with pre and/or
> post fixing) to have rarely used characters (but important)?
> 
> e.g. _aum_ or _#aum or __aum or aum## or (aum) or any other combination.
> 
> or as suggested on hiwiki,
> '_M' will be 'ं' and should not be combined with other sequence. (Instead of
> '_' some other character may be used. _italics_ )
> This can be extended to 'अे' (a_e), 'अै' (a_ai).

I think if we do have to go with the _ idea, it should probably be au_M for ॐ, and auM should remain औं as it currently is (coz ॐ is probably lesser used than औं). Also, since 'ं' is used much more than ॐ, its best to keep its input as stable as possible.

Also (unrelated to this bug), the idea suggested on hi-wp was for a breaker key, typing which would ensure that joining rules are not applied on the next keystroke (i.e the next keystroke is rendered independent of combining rules, using only the basic input rules). So if = is the breaker key, writing a=u would output अउ instead of औ. This could probably also be made to work like: if the breaker key is pressed once, the first matching rule is skipped and the next one applied. If it is pressed twice, the first two matching rules are skipped and the third rule applied and so on.
Comment 17 Shantanoo 2012-07-25 20:06:40 UTC
(In reply to comment #16)

> I think if we do have to go with the _ idea, it should probably be au_M for ॐ,
> and auM should remain औं as it currently is (coz ॐ is probably lesser used than
> औं). Also, since 'ं' is used much more than ॐ, its best to keep its input as
> stable as possible.

Still think that instead of au_M, _auM_ or _aum_ is better. 

Another way is to use meta/alt key combination. e.g. a + u + Meta/alt-m


> Also (unrelated to this bug), the idea suggested on hi-wp was for a breaker
> key, typing which would ensure that joining rules are not applied on the next
> keystroke (i.e the next keystroke is rendered independent of combining rules,
> using only the basic input rules). So if = is the breaker key, writing a=u
> would output अउ instead of औ. This could probably also be made to work like: if
> the breaker key is pressed once, the first matching rule is skipped and the
> next one applied. If it is pressed twice, the first two matching rules are
> skipped and the third rule applied and so on.

IMO, this may make the logic complex for the end user to remember the sequence for typing.
Comment 18 Siddhartha Ghai 2012-07-26 10:35:10 UTC
(In reply to comment #17)
> Still think that instead of au_M, _auM_ or _aum_ is better. 
> 
> Another way is to use meta/alt key combination. e.g. a + u + Meta/alt-m
> 

_auM_ seems fine to me.

> IMO, this may make the logic complex for the end user to remember the sequence
> for typing.

The end-user has no need to bother with the logic, but yes, having a series of underscores differentiate between inputs would probably be confusing. It could still be used for just two (like auM and au_M), but as I said above, I don't mind _auM_ either.

Have asked at hi-wp if anyone objects to either _aum_ or _auM_ . If noone does we can go through with either.
Comment 19 Siddhartha Ghai 2012-08-06 10:11:02 UTC
There've been no objections. Feel free to go ahead.
Comment 20 matanya 2012-08-12 10:00:37 UTC
patch merged.
Comment 21 Amir E. Aharoni 2012-08-12 10:01:30 UTC
The patch that was merged doesn't include all the requests from this discussion. I'll submit a new patch soon.
Comment 22 Andre Klapper 2012-12-15 17:13:49 UTC
(In reply to comment #21)
> The patch that was merged doesn't include all the requests from this
> discussion. I'll submit a new patch soon.

Amir: Did you have time for this?
Comment 23 Amir E. Aharoni 2013-06-13 21:22:16 UTC
Moving to ULS.
Comment 24 Andre Klapper 2013-09-26 14:28:57 UTC
[Assignee was removed, hence also resetting ASSIGNED status]
Comment 25 Siddhartha Ghai 2013-10-02 06:11:22 UTC
Does this need to be reported upstream in the github bug tracker of jquery.ime ?
Comment 26 Siebrand Mazeland 2013-10-02 08:33:27 UTC
(In reply to comment #25)
> Does this need to be reported upstream in the github bug tracker of
> jquery.ime?

That would be preferable. Then this issue can be marked "upstream" with reference to the upstream report.
Comment 27 praveenp 2013-10-03 04:01:11 UTC
Why so? There are so many wikimedia codes are managed at GitHub, why this one only goes 'upstream' there? Did jquery.ime happened to become no part of wwikimedia projects?
Comment 28 Andre Klapper 2013-10-03 16:10:29 UTC
Most Wikimedia projects on GitHub are just mirrored there from the canonical code repositories located at https://gerrit.wikimedia.org/ , however query.ime is intended to be also used by other projects and not Wikimedia-only, and to get non-WMF developers, having GitHub accounts is more common than Wikimedia Gerrit accounts. But this is a bit offtopic here... :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links