Last modified: 2009-07-24 12:13:29 UTC
I recently came across a wiki which implements a more useful way to access
(search) pages by actually implementing a form of fuzzy (approximate) bookmarking.
I am copying the relevant text from http://wiki.tcl.tk/391 :
To search for the word "cgi" in all page titles, you can use the URL:
To search for this word in all titles and in the full texts, use:
http://purl.org/tcl/wiki/cgi* (in general: an regular expression)
Or, if you prefer, you can enter the search word on the search page, at:
But there's a little more to it. That last URL is actually a form of fuzzy
bookmarking. There is no web page called "search". Wikit presents its contents
as if it were a directory with pages, but its all smoke and mirrors...
First of all, note that all Wikit pages have a unique identifying number. The
"About" page is at http://purl.org/tcl/wiki/1.html, for example. But although
these unique IDs are effective for internal links, they are quite awkward as
bookmarks, since they convey no information whatsoever about the title or
contents of a page.
To offer a more useful way of bookmarking, pages which are not of the form
<number>.html are treated as search instructions to locate a page. The following
URL is an instruction to look for a page titled "hawaii":
Assuming there is a page titled "hawaii" (case is ignored), the above URL will
lead directly to that page.
But wiki's change. So do page titles, occasionally. Some page titles are long
and may contain embedded spaces or other inconvenient characters. This all makes
the above search mechanism a bit too brittle for long-lasting URLs.
To solution which has been adopted here, is to refine the search process as
follows (everything after the slash will be called the search term):
1. If the search term is a reference to a page (<number>.html), then simply
go to that page
2. If the search term matches a page title (while ignoring case), then jump
to the page with that title
3. If the search term includes one or more upper-case letters, modify the
search to be approximate (see below). If the approximate match finds exactly one
page, jump to that page.
4. Otherwise, treat the search term as a regular search, and present the
Approximate matching - if the search term has upper-case letters, for example
"OneTwoThree", it is turned into a match pattern (using the glob / string match
syntax). In the example given, a search would be performed on page titles
matching the pattern "*[Oo]ne*[Tt]wo*[Tt]hree*".
What's the point of all this? Well... this mechanism allows you to specify URLs
pointing into the Tcl'ers Wiki with some quite attractive properties:
* If the search keyword is accurate enough, it's equivalent to a real URL
* If the search is general enough, it'll survive minor title changes (e.g.
* The URL has a meaningful word in it, so people can remember what it was about
* If more pages are added to the wiki, the search will turn up more than one
* This is an extremely useful feature, because the original match will be
one of the search results listed, and so will new - probably related - pages
For an example, here's a link to Don Libes' book on Expect:
And here's a search which lists all pages where the word "expect" is used:
does this solve the problem of not finding "my_faq" and just getting "faq" wiki,
when searching for "faq"?
(In reply to comment #1)
> does this solve the problem of not finding "my_faq" and just getting "faq" wiki,
> when searching for "faq"?
Of course, it will ! as long as the distance between the user input (call it "needle") is not too far away from the needle in
the "haystack". I am an expert in AGREP (see http://www.tgries.de/agrep and there are several spawn-offs which could be
integrated in MediaWiki) and AGREP used with the option "-By" would automatically first try an exact match (my_faq = faq) which
does not match and in that case it increments an error number to 1 an searches with one allowed error. The same loops until at
least one match has been found, usually several similar spellings ...
... which then would be presented to the user to select from OR
... to really create a new page with the "my_faq" page title, if the user wants this.
Are you an developer ?
(added for documentation completeness only)
See also my other enhancement bug
Automatic wiki page name suggestion similar as "Google Suggest"
Changed component to "RecentChanges"
This bug is totally stale, but most of the features requested seem to have been developed in the intervening period. We have a much better search functionality through LuceneSearch, which includes "did you mean", fuzzy matching, etc. We have mwsuggest that does useful things in the search box. I don't think a fuzzy-matching algorithm being automatically triggered on all URLs is a good idea. Resolving FIXED.