Last modified: 2009-05-04 23:42:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15849, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13849 - no text in search results when the match is up to diacritics
no text in search results when the match is up to diacritics
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
unspecified
All All
: Normal minor (vote)
: ---
Assigned To: Robert Stojnic
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-26 08:38 UTC by Le Chat
Modified: 2009-05-04 23:42 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Le Chat 2008-04-26 08:38:41 UTC
If you search for a word without diacritics, the search results include matches for the same word with diacritics. Similarly if you search for a phrase with a hyphen, the results include matches with an en dash. (No doubt there are various other similar rules, and this behaviour is very much desired.) However, when you get such a match, the matched text is not displayed in the list of search results, i.e. you get just a link to the relevant page, without the extract(s) from that page's text which you would normally see in the results list if the match were exact.
Comment 1 Brion Vibber 2008-04-29 20:18:00 UTC
Can you give an example URL?

I'm guessing this is on Wikipedia (or other Wikimedia site) and may be due to current mismatches between how the Lucene backend matches words and how the front-end matches them in the result highlighting. If so, I believe this should be improved when the next version of the Lucene backend rolls out which has support for doing highlighting itself.
Comment 2 Le Chat 2008-04-30 10:54:55 UTC
Example URLs:

http://en.wikipedia.org/wiki/Special:Search?search=Banach-Steinhaus&fulltext=Search
(first result returned is Stefan Banach, but text is missing because in that article the reference contains an en dash rather than a hyphen)

http://en.wikipedia.org/wiki/Special:Search?search=sniezycowy&fulltext=Search
(two results returned, but text missing because the articles contain Sniezycowy with Polish diacritics)
Comment 3 Robert Stojnic 2008-04-30 11:03:26 UTC
This also happens for stemmed words, transliterations and words in different scripts (variants), and is as noted in #1 due to the mismatch between mediawiki highlighting and backend functionality. It will be solved when we switch highlighting to backend. 
Comment 4 Siebrand Mazeland 2008-08-17 20:02:09 UTC
Mass close WONTFIX open Lucene Search issues because extension Lucene Search was removed, and replaced by MWSearch. Please set to REOPENED if behaviour still exists with a another component, and update the domain.
Comment 5 Siebrand Mazeland 2008-08-17 20:18:25 UTC
Mass REOPEN after discussion with Robert. Domain: Wikimedia/lucene-search-2. Assigned to maintainer.
Comment 6 Robert Stojnic 2009-05-04 23:42:49 UTC
Using our custom snippet-extraction backend on wmf wikis so this doesn't happen any more. 

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links