Last modified: 2011-03-13 18:06:48 UTC
Just one example will describe it all :
I just created a new article for "swords of chaos" but searching by "sword of
chaos" (without the s) or even "sword", "swords" or "chaos" doesn't yield
anything in the results for the new article.
I think Wikipedia should match the word "sword" to match "swords".
What it should do: Searching for "sword of chaos" should return the "swords of
a) Wildcard searches are much much slower.
b) Wildcard matches would often produce annoying false matches.
c) Popular search engines such as Google require exact word matches too, probably for the same reasons.
a) Of course it's slower, but that shouldn't be an issue since the site is
already responding very fast and there is always software optimisations and new
hardware added to make it even faster.
b) Wildcard matches do produce false results but will also produce true results
of what the user is searching exactly which is what it's all about. It could
even be use as a checking tool to find duplicates or to easily find relates
c) True, Google do use the exact word when returning query, but if you search
for "chaos" you will at least find "swords of chaos" somewhere.
I didn't find any issue like that before, is this a duplicate? If yes where?
a) Trust me, performance is a big concern. Search is one of our slowest features and bogs down everything.
b) If you want to use wildcards deliberately you are free to do so; search for "sword*".
c) Searching for "swords" or "chaos" does indeed find "swords of chaos", as experimentally verified on my test wiki. Note that if you only have a
couple of pages in your wiki so far the search database may not work very well as it excludes common words.
An additional note: the search index on Wikipedia sites is *not* updated immediately because that grinds our database to a halt. Updates are
performed periodically during relatively low traffic times. You will not receive results for a page you just added on Wikipedia.
Performance is a massive concern. One day ago I completely turned off search on
Ariel because it couldn't handle the load and it was causing major site response
time problems. The site is fast, on the database side, _because_ we gave up on
search on Ariel and switched to Google/Yahoo search. It's been turned off for
much of the day for that reason for quite a while. We turned off search on the
Suda database server for many months for the same reason. I've also contemplated
requesting complete blocking of searches for quoted strings because those are
significantly slower than searches for unquoted strings. Search adds so much
database load that a server just for search has been purchased and more are
likely to be needed in the future.
For your search example, I suggest searching for chaos and one or more other
words which are likely to be in that article. A search for sword chaos seems
likely to find the article. Enclosing the search in quotes is generally a bad
idea. Adding more words which you expect to find in the article is a better
As Brion implied, updating the search index is also a load problem for the
database server and work to improve that continues.
Sorry, it's just not practical to have wildcard searching on by default, given
the way the MySQL search engine works. Doing it by hand, when you need it, and
avoiding searchign for quoted strings if possible, is the best I can suggest.
Brion, search for "sword*" does not work as expected in 1.5.5, and a wildcard
search is needed in some configurations even with a severe performance hit. I
think the team needs to re-visit this issue, a wildcard/regex search is
desperately needed by some.
Please take a look at the patch attached to bug 5711 for a config option to enable all MySQL Boolean logic operators, including wildcard search.