Last modified: 2006-04-30 00:23:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T7752, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 5752 - Pages with precomposed (accented) characters should match unaccented search query
Pages with precomposed (accented) characters should match unaccented search q...
Status: RESOLVED DUPLICATE of bug 1836
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-29 06:47 UTC by Minh Nguyễn
Modified: 2006-04-30 00:23 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Minh Nguyễn 2006-04-29 06:47:11 UTC
At the Vietnamese Wikipedia, most pages (and their titles) include words with
precomposed, accented Unicode characters. (See [[Precomposed character]] and
[[Vietnamese alphabet]].) However, users who search for articles at the
Vietnamese Wikipedia often enter queries with the unaccented base characters,
with the expectation that MediaWiki will understand their query. MediaWiki
neither strips combining characters (Bug 1836) nor converts the precomposed
characters in existing pages to their base ASCII characters (i.e., ô→o and ậ→a)
when searching page titles or text, so the search feature consistently returns
disappointing results.

Steps to reproduce:
1. Search for "viet nam" or "Viet Nam" (without the quotes) at the Vietnamese
Wikipedia

Expected results:
[[vi:Việt Nam]] is the first result, or at least somewhere in the results.

Actual results:
"Việt Nam" is nowhere to be found.
Comment 1 Minh Nguyễn 2006-04-29 06:50:04 UTC
Please see also Bug 1836, Comment 3:

> Perhaps the search function should ignore diacritics in article titles when the
user has entered a query that contains no diacritics. If the user has entered in
diacritics, the software should respect that. It would also be nice if there
were a MediaWiki message in which a list of diacritics could be customized per
wiki or locale, since different languages distinguish letters and diacritics
differently.
Comment 2 Brion Vibber 2006-04-29 21:26:51 UTC

*** This bug has been marked as a duplicate of 1836 ***
Comment 3 Minh Nguyễn 2006-04-30 00:19:38 UTC
This is not the same as Bug 1836. That bug is for ignoring combining (but
separate) diacritical characters in Unicode; this bug is for converting
precomposed characters, which might be a lot more complicated.
Comment 4 Brion Vibber 2006-04-30 00:23:52 UTC
No, that's the same thing.

*** This bug has been marked as a duplicate of 1836 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links