Last modified: 2013-06-18 15:28:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16152, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14152 - Bad search highlighting and unwanted results
Bad search highlighting and unwanted results
Status: RESOLVED WORKSFORME
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Robert Stojnic
http://en.wikipedia.org/wiki/Special:...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-16 12:55 UTC by MER-C
Modified: 2013-06-18 15:28 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MER-C 2008-05-16 12:55:16 UTC
If I search for the word "rest" in the English Wikipedia (url: http://en.wikipedia.org/wiki/Special:Search?search=rest&fulltext=Search ) some distance down, but still on the front page, I get the item [[Virginia]]. The context was this:

... nps.gov/shen/naturescience/forests.htm |title=Shenandoah National Park - Forests |publisher=National Park Service |accessdate=2007-09-10}}</ref>  ... legislation, and the jointly run [[Chesapeake Bay Program]] which conducts restoration on the bay and its watershed. The [[Great Dismal Swamp National Wil ...

The word "forests" and "restoration" were highlighted, yet are unwanted. (The article contains the word "rest" once, it really shouldn't be on the first page.) Continue the search for more instances.

If a word contains "rest" then it should only be returned and/or highlighted if the full word is related to rest, e.g. "rests", "resting", etc.

Some sort of JDBC accessible database containing lists of related words is probably the most efficient solution.
Comment 1 Robert Stojnic 2008-05-16 13:00:39 UTC
The solution is to have lucene do the highlighting, so that the highlighter exactly knows which words or stems match the query. This has been implemented, but awaits for new hardware in order to go live on wikimedia sites. 

I've also put a non-naive-approach highlighting into the core MediaWiki (wgAdvancedSearchHighlighting), but sysadmins are reluctant to turn it on because of possible performance issues. 

Comment 2 Siebrand Mazeland 2008-08-17 20:02:09 UTC
Mass close WONTFIX open Lucene Search issues because extension Lucene Search was removed, and replaced by MWSearch. Please set to REOPENED if behaviour still exists with a another component, and update the domain.
Comment 3 Siebrand Mazeland 2008-08-17 20:18:25 UTC
Mass REOPEN after discussion with Robert. Domain: Wikimedia/lucene-search-2. Assigned to maintainer.
Comment 4 WBT 2008-09-27 04:12:38 UTC
If I search "slovenia slovenian" (to see what someone from the place is called), I get results that include both words
BUT only the first one is marked.  Where the second word is shown, the last letter is in plain-type.
This seems like a simple bug in the highlighter code.

Comment 5 matanya 2012-07-30 20:24:03 UTC
unable to reproduce. works for me.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links