Last modified: 2008-05-19 20:22:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3701, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 1701 - special firstChar() routine for Korean characters


Summary:	special firstChar() routine for Korean characters

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Categories (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	High enhancement with 2 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://ko.wikipedia.org/wiki/Category...
Whiteboard:
Keywords:	patch, patch-need-review

Depends on:
Blocks:	3950
	Show dependency tree / graph

Reported:	2005-03-16 05:02 UTC by Puzzlet Chung
Modified:	2008-05-19 20:22 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Patch for LanguageUtf8.php (1.88 KB, patch) 2005-06-14 04:46 UTC, Puzzlet Chung	Details
Add an attachment (proposed patch, testcase, etc.)

Description Puzzlet Chung 2005-03-16 05:02:35 UTC

Since the written Korean language -- hangul -- is syllablic, pages in a category
page are sectioned with their initial syllables other than letters or phonemes.
As a result, almost every page has eventually its own section. Look at the URL,
which is equivalent to the Category:People in the English Wikipedia. In the
Korean category page, many pages have their own sections, such as
Category:Austrian_people, which falls in the "Au" section,
Category:Polish_people, which falls in the "Pol" section, etc. (They can be
recategorized to Category:People_by_nationality of course, but that's not the
point of the discussion.)

Every hangul letter can be divided to consonants and vowels, and it could be the
better index scheme for category pages if we section by the initial consonants
of initial letters of the pages:
* articles starting with from 가(U+AC00) to 낗(U+B097) under the section with a
title ㄱ(U+1100),
* from 나(U+B098) to 닣(U+B2E3) under ㄴ(U+1102),
* from 다(U+B2E4) to 띻(U+B77B) under ㄷ(U+1103),
* from 라(U+B77C) to 맇(U+B9C7) under ㄹ(U+1105),
* from 마(U+B9C8) to 밓(U+BC13) under ㅁ(U+1106),
* from 바(U+BC14) to 삫(U+C0AB) under ㅂ(U+1107),
* from 사(U+C0AC) to 앃(U+C543) under ㅅ(U+1109),
* from 아(U+C544) to 잏(U+C78F) under ㅇ(U+110B),
* from 자(U+C790) to 찧(U+CC27) under ㅈ(U+110C),
* from 차(U+CC28) to 칳(U+CE73) under ㅊ(U+110E),
* from 카(U+CE74) to 킿(U+D0BF) under ㅋ(U+110F),
* from 타(U+D0C0) to 팋(U+D30B) under ㅌ(U+1110),
* from 파(U+D30C) to 핗(U+D557) under ㅍ(U+1111),
* and from 하(U+D558) to 힣(U+D7A3) under ㅎ(U+1112).

Comment 1 Ævar Arnfjörð Bjarmason 2005-04-27 04:04:05 UTC

A duplicate of bug 1984.

*** This bug has been marked as a duplicate of 1984 ***

Comment 2 Puzzlet Chung 2005-06-14 04:46:48 UTC

Created attachment 609 [details]
Patch for LanguageUtf8.php

Comment 3 Puzzlet Chung 2005-06-14 04:59:57 UTC

Changes in LanguageKo.php work fine in Korean Wikipedia, but multilingual
projects like Meta-wiki Wikisource need to be updated too.  I attached the patch
file, which only modifies firstChar() to specially treat the Hangul Syllables
Area(U+AC00 ~ U+D7A3), but for any other characters it will do as what it has
been doing.  But I'm not sure which file is the appropriate to be patched -
Language.php or LanguageUtf8.php.  Take this for a test -
http://wikisource.org/wiki/Category:%ED%95%9C%EA%B5%AD%EC%96%B4 - which should
be not more than 10 sections after commit.

Comment 4 Puzzlet Chung 2005-11-13 07:34:26 UTC

It's now OK for Korean Wikisource (
http://ko.wikisource.org/wiki/%EB%B6%84%EB%A5%98:%EC%8B%9C%EC%A1%B0 ) but
multilingual wiki like Meta-wiki still has this issue (
http://meta.wikimedia.org/wiki/Category:KO ).

My point is that this feature should be applied universally if it matters with
the pagename with Korean characters.

Comment 5 Anon Sricharoenchai 2008-04-28 09:43:25 UTC

I second to this, this firstChar() of ko should apply to all wiki language, especially, on multilingual wiki.
Not just on ko wiki.

Comment 6 Kyungjoon Lee 2008-05-01 09:42:45 UTC

Another vote for support here.

Comment 7 Brion Vibber 2008-05-19 20:22:34 UTC

Done in r35055. Also did a tiny bit of cleanup to use utf8ToCodepoint() func instead of the manual UTF-8 decomp code.

(Could just use raw characters here instead of the hex positions, should one desire, but this isn't a performance-critical code path.)

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links