Last modified: 2013-03-26 11:25:02 UTC
Created attachment 9811 [details]
hi-wp search results showing problem in chrome 16.0.912.63 m win7 X64 home basic
This bug is about how diacritics appear in the search results, not about how they're handled while performing the search(bug 27055).
Search results in devanagari show the diacritic on the last entered alphabet separate from it and having no base alphabet in the search results. Example: A search for भारत shows some results for भारतीय as भारत ीय with भारत bold and ीय not bold. Similarly this applies to other diacritics. Entering भारती shows भारतीय correctly with भारती bold and य not bold. Ideally, this is also how search results for भारत should look. Whether or not the ी over त is bold isn't much relevant(ideally it shouldn't be, but it won't hurt much even if it is bold but displays correctly).
A related issue (not sure if this should be a separate bug) is the display of characters just following the input string in the search results in cases where the input string ends with ् (halant/viram). The characters should appear joined instead of appearing separate. For example: A search for प्रस् ताव (with no space in between and प्रस् in bold.) Ideally this should appear joined as प्रस्ताव with प्रस्ता in bold.
Also note that this issue might be present with other indic languages as well.
Attachment: hi-wp search results showing problem in chrome 16.0.912.63 m win7 X64 home basic
Created attachment 9812 [details]
hi-wp search results showing related(halant/viram) problem in chrome 16.0.912.63 m win7 X64 home basic
The issue is in jquery.highlightText.js which just break the text without any intelligence about grapheme clusters.
*** This bug has been marked as a duplicate of bug 33242 ***
[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]