Last modified: 2010-05-15 16:03:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 15863 - Upload: can't save file names with special characters to ntfs filesystem
Upload: can't save file names with special characters to ntfs filesystem
Status: RESOLVED DUPLICATE of bug 1780
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
PC Windows Server 2003
: Normal normal (vote)
: ---
Assigned To: Chad H.
Depends on:
  Show dependency treegraph
Reported: 2008-10-06 12:56 UTC by Paolo Benvenuto
Modified: 2010-05-15 16:03 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

Paolo's patch (450 bytes, patch)
2008-10-06 16:07 UTC, Platonides

Description Paolo Benvenuto 2008-10-06 12:56:48 UTC
I'm using mediawiki on a Windows Server 2003.

When I upload a file and I tell mediawiki to store it with a file name with special characters (i.e. accented characters, like à, é, or ñ, etc.), the file is stored in a wrong way: à -> A with a ~ + ï (i think).

It seems a utf-8 to iso-8859-1 (or the contrary) stuff.

I think it's because ntfs stores file names with iso-8859-1 charset, so that when mediawiki passes the file name in utf-8 charset, ntfs interprets it as a iso-8859-1 string.

Experimenting on my own with uploading and saving a file from php page, I found a solution:

In the case that the upload ends with the instruction

copy ( $tempfile , $filename ) ;

you should change it into

copy ( $tempfile , utf8_decode ( $filename ) ) ;

That seems to eliminate the problem on Windows Server 2003.
Comment 1 Chad H. 2008-10-06 14:30:03 UTC
Confirmed in trunk. Also, where did you find:

> copy ( $tempfile , $filename );

Can't find this in trunk :)
Comment 2 Platonides 2008-10-06 16:07:42 UTC
Created attachment 5393 [details]
Paolo's patch

It's at FileStore.php, the space is not in trunk.
I'm attaching it as a patch, but I'm sure utf8_decode would need to be added on other places as well. filerepo/FSRepo also does several actions directly on the filesystem, thumb.php...
Comment 3 Fran Rogers 2008-10-06 16:46:04 UTC
The utf8_decode solution wouldn't work - this function only converts to ISO-8859-1, and all filenames in non-Latin scripts would be completely corrupted.

NTFS actually uses Unicode internally. The problem lies in PHP, which naïvely assumes all filenames use eight-bit strings... which it does on Unix, but Windows uses wide character strings, and a separate call, _wfopen(), is used to access Unicode filenames on Win32. Until PHP gains proper Unicode support (currently scheduled for right after porcine flight is achieved) the only solution I can think of is for MediaWiki to mangle non-ASCII characters in the filename in a predictable, round-trippable way.
Comment 4 Paolo Benvenuto 2008-10-06 17:50:59 UTC
(In reply to comment #1)
> Confirmed in trunk. Also, where did you find:
> > copy ( $tempfile , $filename );
> Can't find this in trunk :)

No, I supposed that the file is saved with some instruction similar to that, either could be a rename or something else.

I didn't submit a patch because I don't know well mediawiki's code.

Comment 5 Brion Vibber 2008-10-06 17:58:31 UTC
Duping to bug 1780

*** This bug has been marked as a duplicate of bug 1780 ***

Note You need to log in before you can comment on or make changes to this bug.