Last modified: 2013-03-26 11:25:00 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17573, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15573 - Lucene should appropriately highlight text in quotes
Lucene should appropriately highlight text in quotes
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
lucene-search-2 (Other open bugs)
unspecified
All All
: Normal minor with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-12 09:23 UTC by MZMcBride
Modified: 2013-03-26 11:25 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2008-09-12 09:23:49 UTC
Example URL: http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%22peace+be+upon+him%22&ns0=1&fulltext=Search

Ideally, information inside quotes would be interpreted as a single unit, similar to the behavior of Google and many other search engines.
Comment 1 MZMcBride 2008-09-15 17:57:09 UTC
Upon discussion with rainman-sr, it would seem that the results are accurate, but the highlighting is intentionally handicapped for performance reasons.

I've adjusted the bug summary accordingly (under the assumption that this isn't a duplicate bug).

We can't keep trying to actively encourage people to use the internal MW search if we're going to handicap things like proper highlighting. The results become entirely context-less and almost entirely useless.
Comment 2 MZMcBride 2008-09-15 18:54:29 UTC
Possibly fixed in r40863.
Comment 3 Brion Vibber 2008-09-15 19:01:00 UTC
Basic search result highlighting is done by building a regular expression from the search terms which _more or less_ matches what the search engine will do.

The MySQL search engine class was fixed some time ago to handle quoted phrases as a single chunk, but the Lucene extension (MWSearch) wasn't checking for quotes in its parsing. I've ripped the parsing code from SearchMySQL and copied it to MWSearch to handle this better for now.

It would probably be good to do this with some common code to avoid the duplication (though in SearchMySQL this is combined with the actual generation of the MySQL boolean search query, making it a bit more complicated). Might also be good to handle a wider range of whitespace, etc.

It's still not perfect, but it now handles these common cases pretty much as expected.
Comment 4 Andre Klapper 2013-03-26 11:25:00 UTC
[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links