Last modified: 2010-05-15 15:59:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T19146, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 17146 - Search results for utf-8 strings
Search results for utf-8 strings
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.11.x
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-24 16:00 UTC by Hungerburg
Modified: 2010-05-15 15:59 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch against languages/Language.php to let it return results for utf-8 terms (667 bytes, patch)
2009-01-24 16:00 UTC, Hungerburg
Details

Description Hungerburg 2009-01-24 16:00:45 UTC
Created attachment 5727 [details]
Patch against languages/Language.php to let it return results for utf-8 terms

Before being entered into the searchindex table, utf-8 encoded strings are converted to a special notation: eg. dämon becomes du8c3a4mon; the search form does the same transform, but with an uppercase U8 escape - so the search fails in mysql.

Attached patch lets utf-8 search terms return results.
Comment 1 Brion Vibber 2009-01-31 01:19:31 UTC
The form returns results just fine for me...

SearchUpdate::doUpdate() takes the output of Language::stripForSearch() and does further processing to strip markup etc. This includes running it through strtolower() to make it entirely lowercase.

This extra lowercasing is *not* done by Special:Search, which produces the discrepancy you noted -- only the input data is being lowercased.

Searching "ééé FUNKY" hits this query:

SQL: SELECT /*  WikiSysop */ page_id, page_namespace, page_title FROM `page`,`searchindex` WHERE page_id=si_page AND  MATCH(si_title) AGAINST('+U8c3a9U8c3a9U8c3a9 +funky' IN BOOLEAN MODE)  AND page_is_redirect=0 AND page_namespace IN ('0')   LIMIT 20 


However the backend search engine is case-insensitive so it shouldn't make a difference. :)

Worth going ahead and fixing though, just in case. Applied on trunk (for 1.15) in r46629
Comment 2 Hungerburg 2009-01-31 10:49:54 UTC
Thank you Brion, I guess that should not break other peoples installations. I moved a mediawiki between servers and doctored the mysqldump to have the new location store mediawiki utf-8 strings in "utf8-...-ci" columns (was iso...). on import some rows would produce double key errors, so I made everything with a charset "utf-bin" instead. except the search form the wiki works fine so far.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links