Last modified: 2013-08-29 16:56:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 4430 - Use Unicode Character Folding for accents, punctuation chars in search index
Use Unicode Character Folding for accents, punctuation chars in search index
Status: NEW
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
All All
: Normal major with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
: 4379 14180 (view as bug list)
Depends on:
Blocks: 24414
  Show dependency treegraph
 
Reported: 2005-12-30 18:20 UTC by Ilya Konstantinov
Modified: 2013-08-29 16:56 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ilya Konstantinov 2005-12-30 18:20:11 UTC
It would be desirable for Search and especially for the "Go" functionality
(resolving a page title to an actual page without an intermediate search) to use
all sensible Unicode Foldings on the searched titles.

Unicode Character Foldings define[1] string transformations for making two
strings search-equivalent (different from Unicode normalizations, which make
strings content-equivalent). The folded title should not be stored instead of
the original title but in addition to it, and when searching, the comparisons
should be made between a folded search string and the folded title.

We already do certain forms of folding, such as case insensitivity, but we could
benefit from the full set of foldings, such as eliminating the difference
between minus and dashes and more.

[1] http://www.unicode.org/unicode/reports/tr30/
Comment 1 Brion Vibber 2008-05-19 17:27:42 UTC
*** Bug 14180 has been marked as a duplicate of this bug. ***
Comment 2 Brion Vibber 2008-12-28 21:14:34 UTC
De-assigning, as no activity in 3 years. Still a good idea though! :)

K-form normalization would be easy to apply (since UtfNormal class already implements it); other folding may require more coding.
Comment 3 Chad H. 2009-09-03 01:53:15 UTC
*** Bug 4379 has been marked as a duplicate of this bug. ***
Comment 4 Chad H. 2009-09-07 15:42:00 UTC
*** Bug 20529 has been marked as a duplicate of this bug. ***
Comment 5 Nemo 2012-07-13 23:07:09 UTC
This is the same bug as in <https://translatewiki.net/wiki/Thread:Support/Search_index_should_ignore_punctuation>, isn't it?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links