Last modified: 2014-07-02 14:49:45 UTC
I request to add new parser function #hangul, for Korean Wikipedia. In Korean, the particle has different form according to if a before letter has jongseong(batchim). (For example, 를 (reul) is used only after a word ending in a vowel. If the preceding word ends in a consonant, 을 (eul) is used instead. For more information, see [[w:Hangul#Syllabic blocks]].) To solve this problem, we need new paser function. Detail: {{#hangul:AB|CD|EF}} If a last letter of "AB" has jongseong, "CD" is returned. If has not jongseong, or is not hangul, "EF" is returned.
More technical information: (in unicode) Hangul: U+AC00 ~ U+D7A3 Hangul what have not jongseong: U+AC00 + 28(0x1C)*n (U+AC00, U+AC1C, U+AC38, U+AC54, ..., U+D76C, U+D788)
This is a case of morphophonology. It's as if you had to type "a/an 'insert noun here'" all the time because you can never know beforehand whether the noun will start with a vowel or a consonant. So {{#a or an:noun|a|an}} is what Ficell is proposing, I guess.
According to [[w:en:Korean language#Morphophonemics]], we will need to test for an additional case. * Preceding syllable ends with a consonant * Preceding syllable ends with a rieul consonant * Preceding syllable ends with a vowel (no consonant)
Oops, that's [[Korean language#Morphophonemics]]. IMHO it would be nicer if {{#hangul:AB|CD|EF|GH}} returned ABCD, ABEF or ABGH.
I discussed this with Kyungjoon Lee simply at the Korean Wikipedia's user talk page. And I suggest following: (Cf [[Korean language#Morphophonemics]]) (in unicode) Hangul: U+AC00 ~ U+D7A3 Hangul which ends with vowel: U+AC00 + 28(0x1C)*n (U+AC00, U+AC1C, U+AC38, U+AC54, ..., U+D76C, U+D788) Hangul which ends with rieul: U+AC08 + 28(0x1C)*n (U+AC08, U+AC24, U+AC40, U+AC5C, ..., U+D774, U+D790) {{#hanp:AB|CD}} (hanp is abbreviation of hangul particle) * When CD is '로'(ro) or '으로'(euro) ** if a last word of AB ends with consonant(jongseong) except rieul, returned 'AB으로'(ABeuro) ** if a last word of AB ends with vowel or rieul, returned 'AB로'(ABro) ** if a last word of AB is not hangul, returned 'AB로'(ABro) * When CD is '을'(eul), '이'(i), '와'(wa), '은'(eun) or '를'(reul), '가'(ga), '과'(gwa), '는'(neun) ** if a last word of AB ends with consonant, returned 'AB을'(ABeul), 'AB이'(ABi), 'AB와'(ABwa), 'AB은'(ABeun) ** if a last word of AB ends with vowel, returned 'AB를'(ABreul), 'AB가'(ABga), 'AB과'(ABgwa), 'AB는'(ABneun) ** if a last word of AB is not hangul, returned 'AB를'(ABreul), 'AB가'(ABga), 'AB과'(ABgwa), 'AB는'(ABneun)
Yeah, this is how Korean LaTeX macros handle "automatic particle handling" as well. I think Ficell has wa/gwa switched; the Wikipedia table has the correct choices.
Isn't this something that could (should?) be added to language/classes/LanguageKo.php? CC-ing Niklas in. Domain: MediaWiki extensions/ParserFunctions -> MediaWiki/i18n
Could use grammar functionality here, with syntax something like {{GRAMMAR:hanp|AB,CD,EF,GH}} or {{GRAMMAR:hanp:CD,EF,GH|AB}}.
(In reply to comment #7) > Isn't this something that could (should?) be added to > language/classes/LanguageKo.php? > > CC-ing Niklas in. > Domain: MediaWiki extensions/ParserFunctions -> MediaWiki/i18n > Yes, this is. I think so. (In reply to comment #8) > Could use grammar functionality here, with syntax something like > {{GRAMMAR:hanp|AB,CD,EF,GH}} or {{GRAMMAR:hanp:CD,EF,GH|AB}}. > It's also good ideas, but I think {{#hanp:}} is better to use.
If not grammar, would this new tag be in MediaWiki proper, piggyback an existing extension or be in a new extension?
(In reply to comment #10) > If not grammar, would this new tag be in MediaWiki proper, piggyback an > existing extension or be in a new extension? > A new extension seems better, although I don't know detail of MediaWiki software.
I've committed an extension that should work like described in comment #5 and #c6 as r41088. It should be easy to review it because it is very small. It might be a good idea to make a new bug request specifically for enabling that extension on Korean projects.
Created attachment 5358 [details] Patch for Hanp.body.php by Ficell Thanks for your working, Niklas Laxström. Unfortunately I found some problem. If $word contains signs, it doesn't work well. If we want know whether '[[A]]' + 'eul' is correct or not, we can't get result with current #HANP function, because $word ends with ']' sign that we don't read. To solve this problem, I suggest adding new parameter named "output". I made patch for hanp.body.php. Please consider this.
(In reply to comment #13) > Created an attachment (id=5358) [details] > Patch for Hanp.body.php by Ficell Wow, that was an extremely crappy patch. I had to merge that manually, line by line. Please create a proper patch next time. Applied in r42700. How does it work now?
Sorry. I didn't know how to make proper diff file; it now works well. Thanks.
Changed topic and added keywords to request installation of this extension. Should this be installed for all Korean Wikimedia projects?
Yes. Please install the extension.
Hang on, please. Has this extension been tested anywhere? Would it be OK to put it on a "production" server?
(In reply to comment #18) > Has this extension been tested anywhere? Would it be OK to put it on a > "production" server? Well, that is why it had a need-review keyword. *You* can be a reviewer, but a Wikimedia developer will also audit it before it will ever go live.
(In reply to comment #18) I tested my personal wiki. It works well. Actually I found some problems when using in system message, but it isn't the problem of the function itself. It seems no problem so far.
(In reply to comment #20) > (In reply to comment #18) > > It works well. Actually I found some problems when > using in system message, but it isn't the problem of the function itself. Please provide details so Niklas can assess if it can be fixed.
(In reply to comment #21) Sorry for late. I was busy in real life. I'll post it in Betawiki ASAP.
Including this feature among MediaWiki core seems better. If this feature used in default MediaWiki system message, Korean translation will be more precise. Also refer http://translatewiki.net/w/i.php?title=Support&oldid=926697#Parameter_on_log_message
(In reply to comment #23) > Including this feature among MediaWiki core seems better. If this feature used > in default MediaWiki system message, Korean translation will be more precise. > > Also refer > http://translatewiki.net/w/i.php?title=Support&oldid=926697#Parameter_on_log_message > Well, that is interesting. In comment 11 you stated the opposite. What's it gonna be and why exactly?
(In reply to comment #24) I didn't know the difference at that time. Sorry for that.
Removed shell keyword since there's nothing to do on shell. Removed need-review keyword since this has been applied in SVN already and/or should be implemented as a localization in betawiki. I don't even think there is anything left in this bug to do. If there is, please point it out, otherwise it will get closed as FIXED.
{{#HANP:}} is not in core. It is currently an extension.
Assigning to myself for review. (I can't imagine what betawiki would have to do with this... an extension couldn't be used for core localizations since it wouldn't be available in default installations.)
(In reply to comment #28) I meant this should be function in core, like {{plural:}}, not extension.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
If there is interest, I can easily port this to core. Please let me know that you need this.
It's been dragging for 2.5 years now. Whatever leads to an acceptable resolution, I'd say.
We need the function like this way while translating MediaWiki messages. When English words translated into Korean, the latest alphabet (in Korean) would be consonant or vowel. It isn't distinguished in English, but it is in Korean; the particle is transformed because this... If we don't use these function, we must write whole of possible particles. (and now we do it...) It is inefficient and ugly. Sorry for my poor English ;)
Niklas/Brion, as I understand it is needed. can one of you port it please?
FYI: This function is implemented as lua in ko.wikipedia. https://ko.wikipedia.org/wiki/Module:Hangul
(In reply to comment #28 by Brion) > Assigning to myself for review. Brion: As you wrote this in 2009, is that still the case, or would you like to reset the assignee to default?
Presumably this is no longer active, no. :) Reassigning to default.
The extension is currently at https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FHanp (I come from there). (In reply to Niklas Laxström from comment #31) > If there is interest, I can easily port this to core. Please let me know > that you need this. Should this bug moved to core then? Seems so.
There is a https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FJosa too. This is improved version of Extension:Hanp, and maintained by native Korean speaker. And we also have some consensus in local wiki community: https://ko.wikipedia.org/wiki/%EC%9C%84%ED%82%A4%EB%B0%B1%EA%B3%BC:%EC%82%AC%EB%9E%91%EB%B0%A9_%28%EA%B8%B0%EC%88%A0%29/2014%EB%85%84_6%EC%9B%94#.EC.A1.B0.EC.82.AC_.ED.99.95.EC.9E.A5.EA.B8.B0.EB.8A.A5_.EB.8F.84.EC.9E.85 btw, I agree with integrating this feature with core.
(In reply to JuneHyeon Bae (devunt) from comment #39) > There is a https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FJosa > too. > And we also have some consensus in local wiki community: Ok, updated bug.
However I think this feature should be integrated into core.
-shell, this needs reviewing for deployment etc...
This could also easily be accomplished with lua, no extension required
Can someone clarify what actually needs doing here? Do we want both hanp and josa installing? Just Josa? One into core? Both into core? Either way, Josa needs some major cleanup. There's a lot of code duplication, and it's all in global functions (for starters). That'd need doing as part of moving it to core too...
(In reply to Bawolff (Brian Wolff) from comment #43) > This could also easily be accomplished with lua, no extension required (In reply to Chong-Dae Park from comment #35) > FYI: This function is implemented as lua in ko.wikipedia. > > https://ko.wikipedia.org/wiki/Module:Hangul RESOLVED FIXED? ;)
Only Josa, not Hanp. Modules are not very portable, but it's ko.wiki's call whether they're satisfied or not. Core or not doesn't matter so much, the code refactoring needed per above would be the same wouldn't it?
(In reply to Sam Reed (reedy) from comment #45) > (In reply to Bawolff (Brian Wolff) from comment #43) > > This could also easily be accomplished with lua, no extension required > > (In reply to Chong-Dae Park from comment #35) > > FYI: This function is implemented as lua in ko.wikipedia. > > > > https://ko.wikipedia.org/wiki/Module:Hangul > > RESOLVED FIXED? ;) ko.wiki's consensus in comment 39 was to implement Extension, not lua. And it looks like lua is not used much.