Last modified: 2009-05-04 23:42:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13849 - no text in search results when the match is up to diacritics
no text in search results when the match is up to diacritics
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
All All
: Normal minor (vote)
: ---
Assigned To: Robert Stojnic
Depends on:
  Show dependency treegraph
Reported: 2008-04-26 08:38 UTC by Le Chat
Modified: 2009-05-04 23:42 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Le Chat 2008-04-26 08:38:41 UTC
If you search for a word without diacritics, the search results include matches for the same word with diacritics. Similarly if you search for a phrase with a hyphen, the results include matches with an en dash. (No doubt there are various other similar rules, and this behaviour is very much desired.) However, when you get such a match, the matched text is not displayed in the list of search results, i.e. you get just a link to the relevant page, without the extract(s) from that page's text which you would normally see in the results list if the match were exact.
Comment 1 Brion Vibber 2008-04-29 20:18:00 UTC
Can you give an example URL?

I'm guessing this is on Wikipedia (or other Wikimedia site) and may be due to current mismatches between how the Lucene backend matches words and how the front-end matches them in the result highlighting. If so, I believe this should be improved when the next version of the Lucene backend rolls out which has support for doing highlighting itself.
Comment 2 Le Chat 2008-04-30 10:54:55 UTC
Example URLs:
(first result returned is Stefan Banach, but text is missing because in that article the reference contains an en dash rather than a hyphen)
(two results returned, but text missing because the articles contain Sniezycowy with Polish diacritics)
Comment 3 Robert Stojnic 2008-04-30 11:03:26 UTC
This also happens for stemmed words, transliterations and words in different scripts (variants), and is as noted in #1 due to the mismatch between mediawiki highlighting and backend functionality. It will be solved when we switch highlighting to backend. 
Comment 4 Siebrand Mazeland 2008-08-17 20:02:09 UTC
Mass close WONTFIX open Lucene Search issues because extension Lucene Search was removed, and replaced by MWSearch. Please set to REOPENED if behaviour still exists with a another component, and update the domain.
Comment 5 Siebrand Mazeland 2008-08-17 20:18:25 UTC
Mass REOPEN after discussion with Robert. Domain: Wikimedia/lucene-search-2. Assigned to maintainer.
Comment 6 Robert Stojnic 2009-05-04 23:42:49 UTC
Using our custom snippet-extraction backend on wmf wikis so this doesn't happen any more. 

Note You need to log in before you can comment on or make changes to this bug.