Last modified: 2012-02-09 22:02:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2375, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 375 - MySQL 4 MATCHes "Grouped Phrases" as substrings, not word boundaries.
MySQL 4 MATCHes "Grouped Phrases" as substrings, not word boundaries.
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.3.x
All All
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: testme
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-03 22:30 UTC by Morbus Iff
Modified: 2012-02-09 22:02 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Morbus Iff 2004-09-03 22:30:59 UTC
Create a page with the phrase "Folktown Records" (plural). Then, do a
search on "Folktown Record" (singular, WITH QUOTES). The quotes 
group the phrases so that you won't match "Folktown hasn't a single
record". The problem, however, is that the MySQL MATCH seems to 
respect word boundaries on single words, but nto on grouped words.
"Folktown Record" (with quotes) will MATCH "Folktown Records" 
(plural) in an entry. "Grouped Phrase" will match "Ungrouped Phrase"
and so forth.

Without changing any code, you can confirm this behavior by looking for
" Folktown Record " (with the quotes AND leading and trailing spaces). Since
the space is now being treated literally, "Records" doesn't match. I think that
result is what most people are expecting when they type a grouped phrase,
but I sincerely doubt they'll make the cognitive leap to add leading and
trailing spaces to get the proper result.

To fix this in MW, we can take every [quote] and turn them into
[space][quote][space]. In SearchEngine.php:parseQuery4, look for:

$searchon = wfStrencode( $searchon );
$this->mTitlecond = " MATCH(si_title) AGAINST('$searchon' IN BOOLEAN MODE)";
$this->mTextcond = " (MATCH(si_text) AGAINST('$searchon' IN BOOLEAN MODE) AND cur_is_redirect=0)";

and add a new line before it:

$searchon = str_replace( '"', ' " ', $searchon);
$searchon = wfStrencode( $searchon );
$this->mTitlecond = " MATCH(si_title) AGAINST('$searchon' IN BOOLEAN MODE)";
$this->mTextcond = " (MATCH(si_text) AGAINST('$searchon' IN BOOLEAN MODE) AND cur_is_redirect=0)";
Comment 1 Brion Vibber 2004-09-03 22:35:46 UTC
Probably should do this a few lines up where the query is being normalized; there's already different handling of quoted 
phrases and non-quoted words for building text extract match regexps.
Comment 2 Morbus Iff 2004-09-03 22:44:50 UTC
Moving it inside could be:

$searchon .= $terms[1] . $wgLang->stripForSearch( $terms[2] );
if( $terms[3] ) {
   $regexp = preg_quote( $terms[3] );

to:

$searchon .= $terms[1] . $wgLang->stripForSearch( $terms[2] );
$searchon = str_replace( '"', ' " ', $searchon);
if( $terms[3] ) {
   $regexp = preg_quote( $terms[3] );
Comment 3 Jamesday 2004-11-19 22:38:44 UTC
This breaks the search for pi desired in this bug:
http://bugzilla.wikimedia.org/show_bug.cgi?id=42 . At present you can use "3.14"
and expect it to match "3.14159265".

Rather than adding undesired spaces, add the words within the quotes outside the
quotes. That will raise the results for articles containing "Folktown Record"
above those containing "Folktown Records" but won't break the searches for "3.14".
Comment 4 Chad H. 2009-03-07 02:30:22 UTC
Is this still an issue?
Comment 5 Sam Reed (reedy) 2012-02-09 22:02:39 UTC
MySQL 4 isn't supported anymore

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links