Last modified: 2011-03-13 18:06:48 UTC
Just one example will describe it all : I just created a new article for "swords of chaos" but searching by "sword of chaos" (without the s) or even "sword", "swords" or "chaos" doesn't yield anything in the results for the new article. I think Wikipedia should match the word "sword" to match "swords". What it should do: Searching for "sword of chaos" should return the "swords of chaos" article.
a) Wildcard searches are much much slower. b) Wildcard matches would often produce annoying false matches. c) Popular search engines such as Google require exact word matches too, probably for the same reasons.
a) Of course it's slower, but that shouldn't be an issue since the site is already responding very fast and there is always software optimisations and new hardware added to make it even faster. b) Wildcard matches do produce false results but will also produce true results of what the user is searching exactly which is what it's all about. It could even be use as a checking tool to find duplicates or to easily find relates articles. c) True, Google do use the exact word when returning query, but if you search for "chaos" you will at least find "swords of chaos" somewhere. I didn't find any issue like that before, is this a duplicate? If yes where?
a) Trust me, performance is a big concern. Search is one of our slowest features and bogs down everything. b) If you want to use wildcards deliberately you are free to do so; search for "sword*". c) Searching for "swords" or "chaos" does indeed find "swords of chaos", as experimentally verified on my test wiki. Note that if you only have a couple of pages in your wiki so far the search database may not work very well as it excludes common words.
An additional note: the search index on Wikipedia sites is *not* updated immediately because that grinds our database to a halt. Updates are performed periodically during relatively low traffic times. You will not receive results for a page you just added on Wikipedia.
Performance is a massive concern. One day ago I completely turned off search on Ariel because it couldn't handle the load and it was causing major site response time problems. The site is fast, on the database side, _because_ we gave up on search on Ariel and switched to Google/Yahoo search. It's been turned off for much of the day for that reason for quite a while. We turned off search on the Suda database server for many months for the same reason. I've also contemplated requesting complete blocking of searches for quoted strings because those are significantly slower than searches for unquoted strings. Search adds so much database load that a server just for search has been purchased and more are likely to be needed in the future. For your search example, I suggest searching for chaos and one or more other words which are likely to be in that article. A search for sword chaos seems likely to find the article. Enclosing the search in quotes is generally a bad idea. Adding more words which you expect to find in the article is a better approach. As Brion implied, updating the search index is also a load problem for the database server and work to improve that continues. Sorry, it's just not practical to have wildcard searching on by default, given the way the MySQL search engine works. Doing it by hand, when you need it, and avoiding searchign for quoted strings if possible, is the best I can suggest.
Brion, search for "sword*" does not work as expected in 1.5.5, and a wildcard search is needed in some configurations even with a severe performance hit. I think the team needs to re-visit this issue, a wildcard/regex search is desperately needed by some.
Please take a look at the patch attached to bug 5711 for a config option to enable all MySQL Boolean logic operators, including wildcard search.