Last modified: 2007-05-27 22:30:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 7002 - Normalize accented/ligature characters for search terms and indices ("ignore" accents)
Normalize accented/ligature characters for search terms and indices ("ignore"...
Status: RESOLVED DUPLICATE of bug 920
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
All All
: Normal enhancement with 6 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: 9606 (view as bug list)
Depends on:
  Show dependency treegraph
Reported: 2006-08-13 21:05 UTC by Pietro Giorgianni
Modified: 2007-05-27 22:30 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Pietro Giorgianni 2006-08-13 21:05:55 UTC
Most European languages use letters which not present in the standard ASCII set.

it would be IHMO very useful allowing a more elastic search, that is:
* ae, oe, ue for ä, ö, ü; for instance, Goedel for Gödel;
* ss for ß; for instance, Grossmann for Großmann;
* a, e, i etc. for à, á, â, è, é, ê, ì, í, î etc; for instance, geologie (and
gèologie) for géologie;
* n for ñ; for instance, Bunuel for Buñuel;
* o (don't know if it's the best letter) for ø; for instance, Kobenhavn for
* aa for å; for instance, Aahrus for Århus;
* C (or a better choice) for Č; for instance, Cesky for Česky;
* maybe others that I don't know.

this would be useful for a number of reasons:
# most keyboard layouts lack some letters;
# there is a long lasting tradition, among internet users, to avoid nonASCII
characters for compatibility, and, therefore, the habit to use "semplified"
# some of the previous substitutions are officially accepted in printing
conventions: it's the case of the German ae, oe, ue, ss;
# often somebody doesn't know the exact spelling for a world in a foreign language;
# Google does it already! :P
Comment 1 Lorenzo Paulatto 2006-08-14 09:29:06 UTC
ñ is sometime transcripted as "nh"
Comment 2 Pietro Giorgianni 2006-08-14 09:53:30 UTC
(In reply to comment #1)
> ñ is sometime transcripted as "nh"

you're right, i didn't remember.
Comment 3 Benoît Rigaut 2007-01-16 19:32:29 UTC

2 examples:

1) a search for 'emmaus'
matches only for 13% 'emmaüs'

2) a search for 'circe'
matches only for 1.4% 'circé' but 100% circ

sounds very impratical for at least french users who have a strong habit of non accentuated search strings
Comment 4 cecile robin 2007-04-17 11:48:01 UTC
A simpler suggestion would be to ignore accents, this would be very useful for
languages such as french and greek, and i suppose for many others. This is the
way the google search engine works. Ok it does not help for special letters such
as the german β for example (sorry it's written in greek keyboard...), you need
to configure your keyboard to enter such letters but it would work better as a
standard and would be very helpful when you're not sure which accent goes on
which letter. That's my opinion anyway...
Comment 5 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-04-17 16:32:31 UTC
*** Bug 9606 has been marked as a duplicate of this bug. ***
Comment 6 Hendrik Lönngren 2007-05-27 22:30:23 UTC

*** This bug has been marked as a duplicate of bug 920 ***

Note You need to log in before you can comment on or make changes to this bug.