Last modified: 2011-03-13 18:06:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2447, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 447 - Search function doesn't add wildcard by default
Search function doesn't add wildcard by default
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Bugzilla (Other open bugs)
unspecified
PC Linux
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-11 00:17 UTC by Zurd
Modified: 2011-03-13 18:06 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Zurd 2004-09-11 00:17:43 UTC
Just one example will describe it all :

I just created a new article for "swords of chaos" but searching by "sword of
chaos" (without the s) or even "sword", "swords" or "chaos" doesn't yield
anything in the results for the new article.

I think Wikipedia should match the word "sword" to match "swords".

What it should do: Searching for "sword of chaos" should return the "swords of
chaos" article.
Comment 1 Brion Vibber 2004-09-11 00:20:08 UTC
a) Wildcard searches are much much slower.
b) Wildcard matches would often produce annoying false matches.
c) Popular search engines such as Google require exact word matches too, probably for the same reasons.
Comment 2 Zurd 2004-09-11 00:51:57 UTC
a) Of course it's slower, but that shouldn't be an issue since the site is
already responding very fast and there is always software optimisations and new
hardware added to make it even faster.

b) Wildcard matches do produce false results but will also produce true results
of what the user is searching exactly which is what it's all about.  It could
even be use as a checking tool to find duplicates or to easily find relates
articles.

c) True, Google do use the exact word when returning query, but if you search
for "chaos" you will at least find "swords of chaos" somewhere.

I didn't find any issue like that before, is this a duplicate?  If yes where?
Comment 3 Brion Vibber 2004-09-11 00:59:51 UTC
a) Trust me, performance is a big concern. Search is one of our slowest features and bogs down everything.

b) If you want to use wildcards deliberately you are free to do so; search for "sword*".

c) Searching for "swords" or "chaos" does indeed find "swords of chaos", as experimentally verified on my test wiki. Note that if you only have a 
couple of pages in your wiki so far the search database may not work very well as it excludes common words.
Comment 4 Brion Vibber 2004-09-11 01:02:54 UTC
An additional note: the search index on Wikipedia sites is *not* updated immediately because that grinds our database to a halt. Updates are 
performed periodically during relatively low traffic times. You will not receive results for a page you just added on Wikipedia.
Comment 5 Jamesday 2004-09-11 01:15:07 UTC
Performance is a massive concern. One day ago I completely turned off search on
Ariel because it couldn't handle the load and it was causing major site response
time problems. The site is fast, on the database side, _because_ we gave up on
search on Ariel and switched to Google/Yahoo search. It's been turned off for
much of the day for that reason for quite a while. We turned off search on the
Suda database server for many months for the same reason. I've also contemplated
requesting complete blocking of searches for quoted strings because those are
significantly slower than searches for unquoted strings. Search adds so much
database load that a server just for search has been purchased and more are
likely to be needed in the future.

For your search example, I suggest searching for chaos and one or more other
words which are likely to be in that article. A search for sword chaos seems
likely to find the article. Enclosing the search in quotes is generally a bad
idea. Adding more words which you expect to find in the article is a better
approach.

As Brion implied, updating the search index is also a load problem for the
database server and work to improve that continues.

Sorry, it's just not practical to have wildcard searching on by default, given
the way the MySQL search engine works. Doing it by hand, when you need it, and
avoiding searchign for quoted strings if possible, is the best I can suggest.
Comment 6 Dennis Gerasimov 2006-02-21 03:12:21 UTC
Brion, search for "sword*" does not work as expected in 1.5.5, and a wildcard
search is needed in some configurations even with a severe performance hit. I
think the team needs to re-visit this issue, a wildcard/regex search is
desperately needed by some.
Comment 7 Ben Gertzfield 2006-04-25 15:18:59 UTC
Please take a look at the patch attached to bug 5711 for a config option to enable all MySQL Boolean logic operators, including wildcard search.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links