Last modified: 2009-07-24 12:13:29 UTC
I recently came across a wiki which implements a more useful way to access (search) pages by actually implementing a form of fuzzy (approximate) bookmarking. I am copying the relevant text from http://wiki.tcl.tk/391 : To search for the word "cgi" in all page titles, you can use the URL: http://purl.org/tcl/wiki/cgi To search for this word in all titles and in the full texts, use: http://purl.org/tcl/wiki/cgi* (in general: an regular expression) Or, if you prefer, you can enter the search word on the search page, at: http://purl.org/tcl/wiki/search But there's a little more to it. That last URL is actually a form of fuzzy bookmarking. There is no web page called "search". Wikit presents its contents as if it were a directory with pages, but its all smoke and mirrors... First of all, note that all Wikit pages have a unique identifying number. The "About" page is at http://purl.org/tcl/wiki/1.html, for example. But although these unique IDs are effective for internal links, they are quite awkward as bookmarks, since they convey no information whatsoever about the title or contents of a page. To offer a more useful way of bookmarking, pages which are not of the form <number>.html are treated as search instructions to locate a page. The following URL is an instruction to look for a page titled "hawaii": http://purl.org/tcl/wiki/hawaii Assuming there is a page titled "hawaii" (case is ignored), the above URL will lead directly to that page. But wiki's change. So do page titles, occasionally. Some page titles are long and may contain embedded spaces or other inconvenient characters. This all makes the above search mechanism a bit too brittle for long-lasting URLs. To solution which has been adopted here, is to refine the search process as follows (everything after the slash will be called the search term): 1. If the search term is a reference to a page (<number>.html), then simply go to that page 2. If the search term matches a page title (while ignoring case), then jump to the page with that title 3. If the search term includes one or more upper-case letters, modify the search to be approximate (see below). If the approximate match finds exactly one page, jump to that page. 4. Otherwise, treat the search term as a regular search, and present the search results. Approximate matching - if the search term has upper-case letters, for example "OneTwoThree", it is turned into a match pattern (using the glob / string match syntax). In the example given, a search would be performed on page titles matching the pattern "*[Oo]ne*[Tt]wo*[Tt]hree*". What's the point of all this? Well... this mechanism allows you to specify URLs pointing into the Tcl'ers Wiki with some quite attractive properties: * If the search keyword is accurate enough, it's equivalent to a real URL * If the search is general enough, it'll survive minor title changes (e.g. typo's) * The URL has a meaningful word in it, so people can remember what it was about * If more pages are added to the wiki, the search will turn up more than one match * This is an extremely useful feature, because the original match will be one of the search results listed, and so will new - probably related - pages For an example, here's a link to Don Libes' book on Expect: http://purl.org/tcl/wiki/Expect And here's a search which lists all pages where the word "expect" is used: http://purl.org/tcl/wiki/expect*
does this solve the problem of not finding "my_faq" and just getting "faq" wiki, when searching for "faq"?
(In reply to comment #1) > does this solve the problem of not finding "my_faq" and just getting "faq" wiki, > when searching for "faq"? Of course, it will ! as long as the distance between the user input (call it "needle") is not too far away from the needle in the "haystack". I am an expert in AGREP (see http://www.tgries.de/agrep and there are several spawn-offs which could be integrated in MediaWiki) and AGREP used with the option "-By" would automatically first try an exact match (my_faq = faq) which does not match and in that case it increments an error number to 1 an searches with one allowed error. The same loops until at least one match has been found, usually several similar spellings ... ... which then would be presented to the user to select from OR ... to really create a new page with the "my_faq" page title, if the user wants this. Are you an developer ?
(added for documentation completeness only) See also my other enhancement bug http://bugzilla.wikimedia.org/show_bug.cgi?id=2486 Automatic wiki page name suggestion similar as "Google Suggest"
Changed component to "RecentChanges"
This bug is totally stale, but most of the features requested seem to have been developed in the intervening period. We have a much better search functionality through LuceneSearch, which includes "did you mean", fuzzy matching, etc. We have mwsuggest that does useful things in the search box. I don't think a fuzzy-matching algorithm being automatically triggered on all URLs is a good idea. Resolving FIXED.