Last modified: 2007-07-13 18:49:45 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2920, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 920 - Transliterated umlauts in the search field won't resolve
Transliterated umlauts in the search field won't resolve
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
PC All
: Normal normal with 14 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://de.wikipedia.org
:
: 7002 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-20 17:28 UTC by Denis Grelich
Modified: 2007-07-13 18:49 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Denis Grelich 2004-11-20 17:28:57 UTC
If I enter a term containing umlauts in the search field on the left, but transliterate the 
umlauts, the action fails and I am presented the search page, if there is no redirection for 
that page. On the english Wikipedia most of the time there are such redirects. For the german 
Wikipedia there would be not much sense to it.

Examples:
Goedel (for Gödel) fails on de.wikipedia.org; on en.wikipedia.org it resolves correctly.
Godel resolves on the english page too.
Comment 1 Denis Grelich 2004-11-20 17:31:40 UTC
Would it be possible to resolve transliterated umlauts automatically to the correct character? It surely 
wouldn't break anything.
Comment 2 Andreas Franke 2005-12-16 15:30:57 UTC
Automatically adding the reverse-transliterated umlauts to the search results
would be
desirable in my opinion, in particular on de.wikipedia.org .
For example, entering "kuenstliche intelligenz" in the search box there
came up with the movie "A.I. – Künstliche Intelligenz", but not with the 
main entry http://de.wikipedia.org/wiki/K%C3%BCnstliche_Intelligenz 
which I was only able to find via the entry for the "AI" acronym.
Comment 3 Ibn Battuta@WP 2007-05-27 00:10:10 UTC
It would be nice to add more than just the umlauts and to more than just the German Wikipedia: The same (or worse) problem occurs on any Wikipedia that uses the Latin alphabet with special characters: The Spanish, Portuguese, French, Scandinavian (...), Slavic (... ... ...), Turkish languages, to name just the largest groups (with obviously many subgroups). 
Comment 4 Hendrik Lönngren 2007-05-27 22:10:47 UTC
I agree with #3, and would still add to it. It would be desirable to handle both transliterated special characters and the accent- and featureless plain latin characters from which they have been derived as possible occurences of that special character. For example oe (common in Germany) or o (common in Sweden) for ö, or aa / a for å. I would even extend this mechanism to handling some groups of punctuation characters as one character in search, for example different quotation marks " „ “ ” « », different dashes - – —, different apostrophes ' ’ (see German article "Germany’s next topmodel"; there is a redirect from the simple version, though) etc.
Comment 5 Hendrik Lönngren 2007-05-27 22:30:23 UTC
*** Bug 7002 has been marked as a duplicate of this bug. ***
Comment 6 longthinker 2007-05-28 08:36:36 UTC
This also applies to pinyin characters (latinization of chinese characters): for example "wuji" will not find "wújí" (as in german Wikipedias article "Taiji"). Both notations are common, the former especially in printed books.
Comment 7 Robert Stojnic 2007-07-13 18:49:45 UTC
Fixed in Lucene Search 2. Accents are always stripped, and common transliterations are added as aliases (see Bug 7002). 

So, searching for Goedel should find Kurt Gödel as the first hit on both en and de wiki.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links