Last modified: 2013-03-25 14:56:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T7711, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 5711 - Support MySQL >= 4.x boolean logic in search
Support MySQL >= 4.x boolean logic in search
Status: NEW
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.6.x
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: testme
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-25 02:34 UTC by Ben Gertzfield
Modified: 2013-03-25 14:56 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch to enable MySQL fulltext Boolean logic search characters (1.94 KB, patch)
2006-04-25 02:35 UTC, Ben Gertzfield
Details

Description Ben Gertzfield 2006-04-25 02:34:00 UTC
The current code in 1.6.3's SearchMySQL4.php is really close to allowing full Boolean logic in searches.  These are things like:

+ninja -pirate (to match all articles with "ninja", but not "pirate")
"ninja pirate" (to match all articles with "ninja" followed immediately by "pirate")
ninj* (to match ninja, ninjutsu, ninjato)

However, the "special" characters +, -, *, etc. are stripped by SearchEngine's method legalSearchChars.

In addition, SearchMySQL4.php currently always adds '+' before each word to be searched if no Boolean operators are present.

The attached patch (against MediaWiki 1.6.3) adds two config options to change this behavior:

1) $wgSearchMysqlStrictMatch = true;

This keeps the current default behavior to prepend '+' to all searches that don't otherwise specify a Boolean logic operator if set to 'true', but allows admins 
to set it to 'false' to default searches to Boolean 'OR' (where pages matching multiple words are ranked higher)

2) $wgSearchMysqlBooleanLogic = false;

This keeps the current default behavior to strip MySQL Boolean logic characters (-, +, <, >, ~, *) if set to false.  If set to true, it changes SearchMySQL4.php 
to override legalSearchChars() to include the Boolean logic chars.

I'll attach the patch to this bug as soon as it's submitted.
Comment 1 Ben Gertzfield 2006-04-25 02:35:06 UTC
Created attachment 1606 [details]
Patch to enable MySQL fulltext Boolean logic search characters

This patch should work against 1.6.3 or svn trunk, although I haven't tested
the latter.
Comment 2 Brion Vibber 2006-04-25 05:49:13 UTC
A config option here seems totally unnecessary; rather it should simply accept overrides as 
given, as the code is intended to do (but may not be doing at the moment, as it's buggy).

Consider also bug 4021.
Comment 3 Ben Gertzfield 2006-04-25 14:59:42 UTC
I was mainly concerned for performance implications if I enabled the overrides and/or Boolean logic by default; since they're off now, we may be hiding a potential 
performance problem.

Since it's my first MediaWiki patch, I wanted to ensure no impact on performance without a config change -- I know search performance is critical for Wikipedia.  Then 
again, Wikipedia uses Lucene search, not MySQL, right?

As we're requiring MySQL 4.x or higher now, we can depend on the database handling the heavy lifting for the Boolean logic.  Does that mean I should feel free to enable 
this by default?

If so, I'm happy to take out the config options.

Comment 4 Brion Vibber 2006-04-26 02:20:19 UTC
1) User-specified boolean operations are/should be allowed.
2) AND must be the default logical mode where not specified. A default of OR produces 
useless results most of the time and is contrary to user expectations.
Comment 5 Ben Gertzfield 2006-04-26 15:51:07 UTC
OK, good comments.  

For 2), if we make AND the default, how can we specify "OR" mode?  In MySQL, it's unfortunately the *absence* of a 
Boolean operator that specifies an "or" search.

If we default to prepending a + if no other operator is present, we effectively disable "OR" mode search.

I can think of two alternatives:

1) Use OR mode search by default, but use a second MATCH ... AGAINST to provide Lucene-style relevancy ranking, so that 
MySQL ranks rows with both hits highest.  This prevents the 'useless' rows from showing up too high in the match.

2) Provide an alternate operator (pipe perhaps?) that tells us to strip it, but not prepend '+' in this case.
Comment 6 Chad H. 2008-04-10 21:20:35 UTC
Is this still an issue with the new search?
Comment 7 Brion Vibber 2008-04-11 19:27:00 UTC
Currently I know we support phrases ("apple pie"), wildcard suffixes (app*), and exclusions (apple -pie). I'm not sure about the priority bumping (>apple <pie), negation (apple ~pie), or parenthetical grouping.

Handling logical ORs is tricky currently since OR is indicated on the backend syntax by the _lack_ of a +, which we have to add to make normal behavior make sense.

What I would recommend is a nice parsing layer so that a query like this:

  apple OR cherry pie

will translate to the MySQL backend syntax:

  +(apple cherry) +pie


This'll allow us to do nice boolean queries like with Lucene or Google without hardcoding the non-intuitive MySQL syntax where we have to explicitly say that things are required.

See full docs at http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
Comment 8 John Du Hart 2011-09-01 03:31:24 UTC
Comment on attachment 1606 [details]
Patch to enable MySQL fulltext Boolean logic search characters

Thanks for submitting a patch to MediaWiki. Unfortunately I'm going to have to
mark the patch as obsolete because it no longer merges into our current code.
Comment 9 Chad H. 2011-09-01 03:51:15 UTC
Should be trivial to rebase this against HEAD.
Comment 10 John Du Hart 2011-09-01 03:58:42 UTC
(In reply to comment #9)
> Should be trivial to rebase this against HEAD.

SearchMySQL4 doesn't exist in HEAD anymore.
Comment 11 Chad H. 2011-09-01 06:36:11 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Should be trivial to rebase this against HEAD.
> 
> SearchMySQL4 doesn't exist in HEAD anymore.

Yes, but I figured the same patch could rebase against SearchMySQL.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links