Last modified: 2013-03-26 11:25:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T35548, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 33548 - Devanagari diacritics appear broken off in search results
Devanagari diacritics appear broken off in search results
Status: RESOLVED DUPLICATE of bug 33242
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-05 19:56 UTC by Siddhartha Ghai
Modified: 2013-03-26 11:25 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
hi-wp search results showing problem in chrome 16.0.912.63 m win7 X64 home basic (5.34 KB, image/png)
2012-01-05 19:56 UTC, Siddhartha Ghai
Details
hi-wp search results showing related(halant/viram) problem in chrome 16.0.912.63 m win7 X64 home basic (7.45 KB, image/png)
2012-01-05 20:36 UTC, Siddhartha Ghai
Details

Description Siddhartha Ghai 2012-01-05 19:56:59 UTC
Created attachment 9811 [details]
hi-wp search results showing problem in chrome 16.0.912.63 m win7 X64 home basic

This bug is about how diacritics appear in the search results, not about how they're handled while performing the search(bug 27055).

Search results in devanagari show the diacritic on the last entered alphabet separate from it and having no base alphabet in the search results. Example: A search for भारत shows some results for भारतीय as भारत ीय with भारत bold and  ीय not bold. Similarly this applies to other diacritics. Entering भारती shows भारतीय correctly with भारती bold and य not bold. Ideally, this is also how search results for भारत should look. Whether or not the ी over त is bold isn't much relevant(ideally it shouldn't be, but it won't hurt much even if it is bold but displays correctly).

A related issue (not sure if this should be a separate bug) is the display of characters just following the input string in the search results in cases where the input string ends with ् (halant/viram). The characters should appear joined instead of appearing separate. For example: A search for प्रस् ताव (with no space in between and प्रस् in bold.) Ideally this should appear joined as प्रस्ताव with प्रस्ता in bold.

Also note that this issue might be present with other indic languages as well.

Attachment: hi-wp search results showing problem in chrome 16.0.912.63 m win7 X64 home basic
Comment 1 Siddhartha Ghai 2012-01-05 20:36:34 UTC
Created attachment 9812 [details]
hi-wp search results showing related(halant/viram) problem in chrome 16.0.912.63 m win7 X64 home basic
Comment 2 Santhosh Thottingal 2012-03-28 09:20:08 UTC
The issue is in jquery.highlightText.js which just break the text without any intelligence about grapheme clusters.

*** This bug has been marked as a duplicate of bug 33242 ***
Comment 3 Andre Klapper 2013-03-26 11:25:02 UTC
[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links