Last modified: 2014-04-14 05:02:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2352, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 352 - Don't let MySQL's stopword list prevent indexing of those words, as we want to search them
Don't let MySQL's stopword list prevent indexing of those words, as we want t...
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
All All
: Lowest normal with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://dev.mysql.com/doc/refman/5.6/e...
: patch, patch-need-review
: 25446 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-03 03:29 UTC by Timwi
Modified: 2014-04-14 05:02 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
MaxSem's slow patch (20.04 KB, patch)
2010-02-18 05:15 UTC, Max Semenik
Details

Description Timwi 2004-09-03 03:29:36 UTC
BUG MIGRATED FROM SOURCEFORGE
http://sourceforge.net/tracker/index.php?func=detail&aid=681366&group_id=34373&atid=411192
Originally submitted by Nobody/Anonymous - nobody  2003-02-06 01:18


Stopwords in English can be valid nontrivial words in 
other languages. Please allow searching them! We 
cannot search "an", "he", "me" etc on Polish Wikipedia.
And we cannot search "see also" etc as well which 
were put and left (unfortunatelly) without translating 
them (many pages!)

--Youandme

------------------------- Additional comments ------------------------
Date: 2003-02-06 20:43
Sender: SF user vibber

When we upgrade mysql, I'll see if I can remove the stopword
list. (It's a compile-time thing.)
-------------------------------------------------
Date: 2003-02-06 20:44
Sender: SF user vibber

When we upgrade mysql, I'll see if we can remove the
stopword list. (It's a compiled-in thing, apparently.)
Comment 2 Antoine "hashar" Musso (WMF) 2006-04-30 20:52:16 UTC
we dropped mysql 3.x support with MediaWiki 1.6.
Comment 3 Brion Vibber 2009-07-20 03:20:45 UTC
MySQL 4 and later still have a stopword list, though they aren't as unpleasant as the behavior in previous versions.

It would be nice if we could reliably disable it per table or something...
Comment 4 Subfader 2009-08-07 17:54:45 UTC
Yes, please override it with an own customizable list for users without lucene search.
Comment 5 Max Semenik 2010-02-18 05:15:07 UTC
Created attachment 7143 [details]
MaxSem's slow patch

Best I could come up with - but still pretty slow, maintenance/rebuildtextindex.php runs 30% slower with it. Tested several solutions (oneo of them could be seen in the patch, commented out), but none of them had satisfiable performance. I therefore don't dare to commit it into the trunk. Leaving the patch here so that other folks could take a look at my approach.
Comment 6 Max Semenik 2010-10-09 11:56:12 UTC
*** Bug 25446 has been marked as a duplicate of this bug. ***
Comment 7 p858snake 2011-04-30 00:10:17 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 8 Mark A. Hershberger 2011-05-28 02:59:03 UTC
See http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html which says "To override the default stopword list, set the ft_stopword_file system variable. ... if you change the stopword file itself, you must rebuild your FULLTEXT indexes after making the changes and restarting the server. To rebuild the indexes in this case, it is sufficient to do a QUICK repair operation: REPAIR TABLE tbl_name QUICK;"

So, while you can't "reliably disable it per table", you *can* disable it without compiling by setting ft_stopword_file to "", restarting, and then rebuilding the table.
Comment 9 Max Semenik 2011-05-28 15:53:25 UTC
(In reply to comment #8)
> So, while you can't "reliably disable it per table", you *can* disable it
> without compiling by setting ft_stopword_file to "", restarting, and then
> rebuilding the table.

A task for installer?
Comment 10 Quim Gil 2014-04-14 02:27:14 UTC
Just checking: in the times of Cirrus Search, are MySQL's stopwords in English causing any trouble to searches in non-English wikis?
Comment 11 Chad H. 2014-04-14 05:02:54 UTC
No, nothing like this from the SQL search implementation affects Cirrus' implementation.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links