Last modified: 2006-04-30 00:23:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 5752 - Pages with precomposed (accented) characters should match unaccented search query
Pages with precomposed (accented) characters should match unaccented search q...
Status: RESOLVED DUPLICATE of bug 1836
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2006-04-29 06:47 UTC by Minh Nguyễn
Modified: 2006-04-30 00:23 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Minh Nguyễn 2006-04-29 06:47:11 UTC
At the Vietnamese Wikipedia, most pages (and their titles) include words with
precomposed, accented Unicode characters. (See [[Precomposed character]] and
[[Vietnamese alphabet]].) However, users who search for articles at the
Vietnamese Wikipedia often enter queries with the unaccented base characters,
with the expectation that MediaWiki will understand their query. MediaWiki
neither strips combining characters (Bug 1836) nor converts the precomposed
characters in existing pages to their base ASCII characters (i.e., ô→o and ậ→a)
when searching page titles or text, so the search feature consistently returns
disappointing results.

Steps to reproduce:
1. Search for "viet nam" or "Viet Nam" (without the quotes) at the Vietnamese

Expected results:
[[vi:Việt Nam]] is the first result, or at least somewhere in the results.

Actual results:
"Việt Nam" is nowhere to be found.
Comment 1 Minh Nguyễn 2006-04-29 06:50:04 UTC
Please see also Bug 1836, Comment 3:

> Perhaps the search function should ignore diacritics in article titles when the
user has entered a query that contains no diacritics. If the user has entered in
diacritics, the software should respect that. It would also be nice if there
were a MediaWiki message in which a list of diacritics could be customized per
wiki or locale, since different languages distinguish letters and diacritics
Comment 2 Brion Vibber 2006-04-29 21:26:51 UTC

*** This bug has been marked as a duplicate of 1836 ***
Comment 3 Minh Nguyễn 2006-04-30 00:19:38 UTC
This is not the same as Bug 1836. That bug is for ignoring combining (but
separate) diacritical characters in Unicode; this bug is for converting
precomposed characters, which might be a lot more complicated.
Comment 4 Brion Vibber 2006-04-30 00:23:52 UTC
No, that's the same thing.

*** This bug has been marked as a duplicate of 1836 ***

Note You need to log in before you can comment on or make changes to this bug.