Last modified: 2013-06-18 15:28:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 14152 - Bad search highlighting and unwanted results
Bad search highlighting and unwanted results
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
All All
: Normal normal (vote)
: ---
Assigned To: Robert Stojnic
Depends on:
  Show dependency treegraph
Reported: 2008-05-16 12:55 UTC by MER-C
Modified: 2013-06-18 15:28 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description MER-C 2008-05-16 12:55:16 UTC
If I search for the word "rest" in the English Wikipedia (url: ) some distance down, but still on the front page, I get the item [[Virginia]]. The context was this:

... |title=Shenandoah National Park - Forests |publisher=National Park Service |accessdate=2007-09-10}}</ref>  ... legislation, and the jointly run [[Chesapeake Bay Program]] which conducts restoration on the bay and its watershed. The [[Great Dismal Swamp National Wil ...

The word "forests" and "restoration" were highlighted, yet are unwanted. (The article contains the word "rest" once, it really shouldn't be on the first page.) Continue the search for more instances.

If a word contains "rest" then it should only be returned and/or highlighted if the full word is related to rest, e.g. "rests", "resting", etc.

Some sort of JDBC accessible database containing lists of related words is probably the most efficient solution.
Comment 1 Robert Stojnic 2008-05-16 13:00:39 UTC
The solution is to have lucene do the highlighting, so that the highlighter exactly knows which words or stems match the query. This has been implemented, but awaits for new hardware in order to go live on wikimedia sites. 

I've also put a non-naive-approach highlighting into the core MediaWiki (wgAdvancedSearchHighlighting), but sysadmins are reluctant to turn it on because of possible performance issues. 

Comment 2 Siebrand Mazeland 2008-08-17 20:02:09 UTC
Mass close WONTFIX open Lucene Search issues because extension Lucene Search was removed, and replaced by MWSearch. Please set to REOPENED if behaviour still exists with a another component, and update the domain.
Comment 3 Siebrand Mazeland 2008-08-17 20:18:25 UTC
Mass REOPEN after discussion with Robert. Domain: Wikimedia/lucene-search-2. Assigned to maintainer.
Comment 4 WBT 2008-09-27 04:12:38 UTC
If I search "slovenia slovenian" (to see what someone from the place is called), I get results that include both words
BUT only the first one is marked.  Where the second word is shown, the last letter is in plain-type.
This seems like a simple bug in the highlighter code.

Comment 5 matanya 2012-07-30 20:24:03 UTC
unable to reproduce. works for me.

Note You need to log in before you can comment on or make changes to this bug.