Last modified: 2006-04-30 00:23:52 UTC
At the Vietnamese Wikipedia, most pages (and their titles) include words with precomposed, accented Unicode characters. (See [[Precomposed character]] and [[Vietnamese alphabet]].) However, users who search for articles at the Vietnamese Wikipedia often enter queries with the unaccented base characters, with the expectation that MediaWiki will understand their query. MediaWiki neither strips combining characters (Bug 1836) nor converts the precomposed characters in existing pages to their base ASCII characters (i.e., ô→o and ậ→a) when searching page titles or text, so the search feature consistently returns disappointing results. Steps to reproduce: 1. Search for "viet nam" or "Viet Nam" (without the quotes) at the Vietnamese Wikipedia Expected results: [[vi:Việt Nam]] is the first result, or at least somewhere in the results. Actual results: "Việt Nam" is nowhere to be found.
Please see also Bug 1836, Comment 3: > Perhaps the search function should ignore diacritics in article titles when the user has entered a query that contains no diacritics. If the user has entered in diacritics, the software should respect that. It would also be nice if there were a MediaWiki message in which a list of diacritics could be customized per wiki or locale, since different languages distinguish letters and diacritics differently.
*** This bug has been marked as a duplicate of 1836 ***
This is not the same as Bug 1836. That bug is for ignoring combining (but separate) diacritical characters in Unicode; this bug is for converting precomposed characters, which might be a lot more complicated.
No, that's the same thing. *** This bug has been marked as a duplicate of 1836 ***