Last modified: 2010-05-15 16:03:39 UTC
I'm using mediawiki on a Windows Server 2003. When I upload a file and I tell mediawiki to store it with a file name with special characters (i.e. accented characters, like à, é, or ñ, etc.), the file is stored in a wrong way: à -> A with a ~ + ï (i think). It seems a utf-8 to iso-8859-1 (or the contrary) stuff. I think it's because ntfs stores file names with iso-8859-1 charset, so that when mediawiki passes the file name in utf-8 charset, ntfs interprets it as a iso-8859-1 string. Experimenting on my own with uploading and saving a file from php page, I found a solution: In the case that the upload ends with the instruction copy ( $tempfile , $filename ) ; you should change it into copy ( $tempfile , utf8_decode ( $filename ) ) ; That seems to eliminate the problem on Windows Server 2003.
Confirmed in trunk. Also, where did you find: > copy ( $tempfile , $filename ); Can't find this in trunk :)
Created attachment 5393 [details] Paolo's patch It's at FileStore.php, the space is not in trunk. I'm attaching it as a patch, but I'm sure utf8_decode would need to be added on other places as well. filerepo/FSRepo also does several actions directly on the filesystem, thumb.php...
The utf8_decode solution wouldn't work - this function only converts to ISO-8859-1, and all filenames in non-Latin scripts would be completely corrupted. NTFS actually uses Unicode internally. The problem lies in PHP, which naïvely assumes all filenames use eight-bit strings... which it does on Unix, but Windows uses wide character strings, and a separate call, _wfopen(), is used to access Unicode filenames on Win32. Until PHP gains proper Unicode support (currently scheduled for right after porcine flight is achieved) the only solution I can think of is for MediaWiki to mangle non-ASCII characters in the filename in a predictable, round-trippable way.
(In reply to comment #1) > Confirmed in trunk. Also, where did you find: > > > copy ( $tempfile , $filename ); > > Can't find this in trunk :) No, I supposed that the file is saved with some instruction similar to that, either could be a rename or something else. I didn't submit a patch because I don't know well mediawiki's code.
Duping to bug 1780 *** This bug has been marked as a duplicate of bug 1780 ***