Last modified: 2007-05-27 22:30:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T9002, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 7002 - Normalize accented/ligature characters for search terms and indices ("ignore" accents)
Normalize accented/ligature characters for search terms and indices ("ignore"...
Status: RESOLVED DUPLICATE of bug 920
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
All All
: Normal enhancement with 6 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 9606 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-13 21:05 UTC by Pietro Giorgianni
Modified: 2007-05-27 22:30 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Pietro Giorgianni 2006-08-13 21:05:55 UTC
Most European languages use letters which not present in the standard ASCII set.

it would be IHMO very useful allowing a more elastic search, that is:
* ae, oe, ue for ä, ö, ü; for instance, Goedel for Gödel;
* ss for ß; for instance, Grossmann for Großmann;
* a, e, i etc. for à, á, â, è, é, ê, ì, í, î etc; for instance, geologie (and
gèologie) for géologie;
* n for ñ; for instance, Bunuel for Buñuel;
* o (don't know if it's the best letter) for ø; for instance, Kobenhavn for
København;
* aa for å; for instance, Aahrus for Århus;
* C (or a better choice) for Č; for instance, Cesky for Česky;
* maybe others that I don't know.

this would be useful for a number of reasons:
# most keyboard layouts lack some letters;
# there is a long lasting tradition, among internet users, to avoid nonASCII
characters for compatibility, and, therefore, the habit to use "semplified"
versions;
# some of the previous substitutions are officially accepted in printing
conventions: it's the case of the German ae, oe, ue, ss;
# often somebody doesn't know the exact spelling for a world in a foreign language;
# Google does it already! :P
Comment 1 Lorenzo Paulatto 2006-08-14 09:29:06 UTC
ñ is sometime transcripted as "nh"
Comment 2 Pietro Giorgianni 2006-08-14 09:53:30 UTC
(In reply to comment #1)
> ñ is sometime transcripted as "nh"

you're right, i didn't remember.
Comment 3 Benoît Rigaut 2007-01-16 19:32:29 UTC
right!

2 examples:

1) a search for 'emmaus' http://fr.wikipedia.org/wiki/Special:Search?search=emmaus&go=Consulter
matches only for 13% 'emmaüs'

2) a search for 'circe' http://fr.wikipedia.org/wiki/Special:Search?search=circe&go=Consulter
matches only for 1.4% 'circé' but 100% circ

sounds very impratical for at least french users who have a strong habit of non accentuated search strings
Comment 4 cecile robin 2007-04-17 11:48:01 UTC
A simpler suggestion would be to ignore accents, this would be very useful for
languages such as french and greek, and i suppose for many others. This is the
way the google search engine works. Ok it does not help for special letters such
as the german β for example (sorry it's written in greek keyboard...), you need
to configure your keyboard to enter such letters but it would work better as a
standard and would be very helpful when you're not sure which accent goes on
which letter. That's my opinion anyway...
Comment 5 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-04-17 16:32:31 UTC
*** Bug 9606 has been marked as a duplicate of this bug. ***
Comment 6 Hendrik Lönngren 2007-05-27 22:30:23 UTC

*** This bug has been marked as a duplicate of bug 920 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links