Last modified: 2014-04-13 22:39:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T7992, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 5992 - MySQL-Search should sort results by relevance
MySQL-Search should sort results by relevance
Status: NEW
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.7.x
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-16 23:37 UTC by Daniel Kinzler
Modified: 2014-04-13 22:39 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
extension modifying SearchMSQL4 to order by rank (not really functional, see initial comment) (1.58 KB, text/plain)
2006-05-16 23:39 UTC, Daniel Kinzler
Details

Description Daniel Kinzler 2006-05-16 23:37:54 UTC
MySQL-based search engine used per default does not appear to sort results in
any meaningful way. I have written a small extension that extends SearchMySQL4
to use sorting by relevance (attachment follows), but the data set on my
personal test wiki is not well suited to test it.

I'm a bit confused about MySQL fulltext search, and thus this extension may be
completely pointles. The relevant documentation is at
<http://dev.mysql.com/doc/refman/4.1/en/fulltext-search.html>. A few observations:

* SearchMySQL4 uses the IN BOOLEAN MODE modifier
(http://dev.mysql.com/doc/refman/4.1/en/fulltext-boolean.html). This appears to
cause MySQL to report the relevance at 1.0 for anything that matches, making my
patch pointles. The documentation confirms this behaviour: " They do not
automatically sort rows in order of decreasing relevance". This also confirms
the problem this bug report tires to address.

* After some testing, the way to get a weighted search result with boolean
matching appears to be this:

SELECT page_id, page_namespace, page_title, 
       MATCH(si_text) AGAINST('Quux') as rank 
FROM `page`,`searchindex` 
WHERE page_id=si_page 
AND MATCH(si_text) AGAINST('Quux' IN BOOLEAN MODE)  
AND page_is_redirect=0 
AND page_namespace IN (0) 
ORDER BY rank DESC   

* For some reason though, this "sometimes" gives a rank of zero (but still a
boolean match) on entries that contain the search string (maybe a wordlength
limit? seems unlikely though for the things i've tried). Consequently, not using
the BOOLEAN modifier at all causes some matches (the ones with rank 0) not to show.

As I said, I'm a bit confused, but this is probably worth looking into. The
search feature would be vastly more useful with decent ranking.
Comment 1 Daniel Kinzler 2006-05-16 23:39:15 UTC
Created attachment 1761 [details]
extension modifying SearchMSQL4 to order by rank (not really functional, see initial comment)
Comment 2 Quim Gil 2014-04-12 06:56:08 UTC
Can this very old report be now seen under the light of Cirrus Search?
Comment 3 Chad H. 2014-04-13 22:39:50 UTC
Nope, Cirrus has nothing to do with the core database-backed search implementation. It's up to core to implement this if it's still desired.

It's not even a problem in Cirrus/MWSearch world at all.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links