Last modified: 2012-02-09 22:02:39 UTC
Create a page with the phrase "Folktown Records" (plural). Then, do a search on "Folktown Record" (singular, WITH QUOTES). The quotes group the phrases so that you won't match "Folktown hasn't a single record". The problem, however, is that the MySQL MATCH seems to respect word boundaries on single words, but nto on grouped words. "Folktown Record" (with quotes) will MATCH "Folktown Records" (plural) in an entry. "Grouped Phrase" will match "Ungrouped Phrase" and so forth. Without changing any code, you can confirm this behavior by looking for " Folktown Record " (with the quotes AND leading and trailing spaces). Since the space is now being treated literally, "Records" doesn't match. I think that result is what most people are expecting when they type a grouped phrase, but I sincerely doubt they'll make the cognitive leap to add leading and trailing spaces to get the proper result. To fix this in MW, we can take every [quote] and turn them into [space][quote][space]. In SearchEngine.php:parseQuery4, look for: $searchon = wfStrencode( $searchon ); $this->mTitlecond = " MATCH(si_title) AGAINST('$searchon' IN BOOLEAN MODE)"; $this->mTextcond = " (MATCH(si_text) AGAINST('$searchon' IN BOOLEAN MODE) AND cur_is_redirect=0)"; and add a new line before it: $searchon = str_replace( '"', ' " ', $searchon); $searchon = wfStrencode( $searchon ); $this->mTitlecond = " MATCH(si_title) AGAINST('$searchon' IN BOOLEAN MODE)"; $this->mTextcond = " (MATCH(si_text) AGAINST('$searchon' IN BOOLEAN MODE) AND cur_is_redirect=0)";
Probably should do this a few lines up where the query is being normalized; there's already different handling of quoted phrases and non-quoted words for building text extract match regexps.
Moving it inside could be: $searchon .= $terms[1] . $wgLang->stripForSearch( $terms[2] ); if( $terms[3] ) { $regexp = preg_quote( $terms[3] ); to: $searchon .= $terms[1] . $wgLang->stripForSearch( $terms[2] ); $searchon = str_replace( '"', ' " ', $searchon); if( $terms[3] ) { $regexp = preg_quote( $terms[3] );
This breaks the search for pi desired in this bug: http://bugzilla.wikimedia.org/show_bug.cgi?id=42 . At present you can use "3.14" and expect it to match "3.14159265". Rather than adding undesired spaces, add the words within the quotes outside the quotes. That will raise the results for articles containing "Folktown Record" above those containing "Folktown Records" but won't break the searches for "3.14".
Is this still an issue?
MySQL 4 isn't supported anymore