Last modified: 2012-12-27 20:53:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34207, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32207 - Special:Listfiles doesn't find anything when search term contains umlauts
Special:Listfiles doesn't find anything when search term contains umlauts
Status: NEW
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
1.16.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-04 16:41 UTC by Christian Boltz
Modified: 2012-12-27 20:53 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Christian Boltz 2011-11-04 16:41:30 UTC
Special:Listfiles doesn't find anything if the search term contains german umlauts (the letters äöüÄÖÜß) - for example, searching for "Möhre" will give zero results, but I have files like "Möhre-Wilde-Blüte.jpg" and "Pfalz-Möhren-1.jpg".

The problem seems to be somewhere in $dbr->buildLike which replaces the "ö" with  "\xc3\xb6". This "\xc3\xb6" doesn't match anything in the database.

At the moment, I'm using the patch pasted below as workaround - but it can give too many results (searching for "möhr" will also find "mähr" or even "mxyr").
Additionally, my patch replaces everything except the listed chars, so you might get even more results if a user searches for a special character.
(I can live with that for now - better than finding nothing ;-)

I'm also not sure where this should be fixed - I added my workaround to SpecialListfiles.php to avoid unintentional side effects, but it would probably be better to fix the code in $dbr->buildLike.


Patch with workaround (see limitations/known issues above)
===================================================================
--- SpecialListfiles.php        (Revision 86040)
+++ SpecialListfiles.php        (Arbeitskopie)
@@ -34,11 +34,13 @@ class ImageListPager / __construct()
                }
                $search = $wgRequest->getText( 'ilsearch' );
                if ( $search != '' && !$wgMiserMode ) {
+$search = preg_replace('/[^a-zA-Z0-9_ .-]/', '@@uml@@', $search);
                        $nt = Title::newFromURL( $search );
                        if( $nt ) {
                                $dbr = wfGetDB( DB_SLAVE );
-                               $this->mQueryConds = array( 'LOWER(img_name)' . $dbr->buildLike( $dbr->anyString(), 
-                                       strtolower( $nt->getDBkey() ), $dbr->anyString() ) );
+$cond = $dbr->buildLike( $dbr->anyString(), strtolower( $nt->getDBkey() ), $dbr->anyString() );
+$cond = str_replace('@@uml@@', '_', $cond);
+                               $this->mQueryConds = array( 'LOWER(img_name)' . $cond );
                        }
                }
Comment 1 Brion Vibber 2011-11-04 17:54:39 UTC
I can't reproduce this on 1.16 or trunk; the ö passed through buildLike just fine:

  brion@stormcloud:~/pages/rel1.16$ php maintenance/eval.php 
  > return wfGetDB(DB_MASTER)->buildLike("Pfalz-M\xc3\xb6hren-1.jpg");
   LIKE 'Pfalz-Möhren-1.jpg' 

(I used the \xc3\xb6 in the double-quoted string here to ensure it passes through the terminal ok; that is just a plain UTF-8 ö in the string.)


Is there anything special about your database configuration? Are you using MySQL or one of the less-well supported databases? Any special options?
Comment 2 Christian Boltz 2011-11-04 18:34:33 UTC
Interestingly it works for me on the shell - the output is "LIKE 'Möhre'".

I'm using MySQL 5.0.67. My config isn't very special IMHO, maybe except this:

[client]
default-character-set=latin1
[mysqld]
default-character-set=latin1
default-collation=latin1_german1_ci

Note that this is just a default, and any client (including mediawiki) can specify the charset to use when connecting to MySQL.

On the mediawiki side, my configuration is quite boring and doesn't contain anything related to the charset.

In the meantime I noticed that PHP's error_log() escapes special characters (like umlauts) - if I just echo out the query, it contains "Möhre" in valid UTF-8.
In other words: there must be something wrong on the way between mediawiki and mysql. Let me check...

# show create table page
[...]
  `page_title` varchar(255) character set latin1 collate latin1_bin NOT NULL,
[...]
) ENGINE=MyISAM AUTO_INCREMENT=6244 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci

In other words: The page_title is latin1 (aka ISO-8815-1) in the database, which matches my mySQL defaults.

# select page_title from page where page_title like '%hre%';
+----------------------------------------------------+
| page_title                                         |
+----------------------------------------------------+
| Möhre                                             | 
[...]

Yes, the UTF-8 sequence for "ö" is really displayed as two bytes :-(
Looks like mediawiki didn't tell mysql that it will hand over UTF-8 strings, and MySQL handled them as ISO-8859-1 then...

In case it matters: The wiki was started in 2009 (IIRC MediaWiki 1.14) and updated since then.
Comment 3 Christian Boltz 2012-12-27 20:53:31 UTC
Let me post an update on this:

The problem is that the column contains utf8, but is labeled as latin1.

I could fix this by modifying the charset in the database. The trick is to change the field to varbinary first and then to varchar utf8. The varbinary step is needed to avoid that MySQL does an automatic charset conversion which would result in double-encoded utf8.

In SQL, this means:
    alter table t modify column f varbinary(255);
    alter table t modify column f varchar(255) charset utf8;

That's the easy part. I then found out that I had this problem in various tables and columns, so it took me some hours to do this fix on all of them.

To sum it up: This looks an upgrade problem from a (very?) old version. AFAIK the openSUSE wiki was hit by a similar problem (sorry, I don't know exactly in which MediaWiki version).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links