Last modified: 2014-10-07 17:48:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3780, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 1780 - Can't upload file with non-ASCII name (eg cyrillic) on Windows host
Can't upload file with non-ASCII name (eg cyrillic) on Windows host
Status: NEW
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.20.x
PC Windows XP
: Low normal with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://meta.wikimedia.org/wiki/Image:...
: i18n, patch, patch-reviewed
: 3724 11758 14924 15863 68268 (view as bug list)
Depends on: 3829
Blocks:
  Show dependency treegraph
 
Reported: 2005-03-30 08:36 UTC by Ivan
Modified: 2014-10-07 17:48 UTC (History)
20 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
A basic configurable workaround for this bug (2.83 KB, patch)
2008-03-20 16:31 UTC, Mormegil
Details
How to upload file with non-ASCII name on Windows host (345.00 KB, application/pdf)
2013-04-24 11:34 UTC, orbartal
Details
enable upload file with non-ASCII name on Windows host (345.00 KB, application/pdf)
2013-04-24 11:36 UTC, orbartal
Details

Description Ivan 2005-03-30 08:36:06 UTC
Im runing mediawiki under Apache2 & Windows 2k. And I cant upload file with
russian name. File name becomes wrong when MD saves it to disk, so link on the
file becomes wrong -> 404. I think the solution it to convert Cyrilic file name
into translit (http://en.wikipedia.org/wiki/Cyr), but Im not very good PHP
programmer.

sorry for my english.
Comment 1 Brion Vibber 2005-03-30 08:39:13 UTC
May be a similar issue to bug 362; the OS and filesystem expects certain formatting different from what it's getting (in this case, UTF-8).
Comment 2 Ivan 2005-03-30 14:21:35 UTC
Similar, but not the same. File was create, but with wrond name. 
Should be : Вера.jpg
But it is : ??????????.jpg (I cant past real name, couse it contains wrong
characters)
Comment 3 JeLuF 2005-03-30 16:35:39 UTC
can you provide a link to your wiki that we could use for testing?
Comment 4 Ivan 2005-03-31 04:47:30 UTC
limp.iceberg-m.ru:81/wiki/
Comment 5 Brion Vibber 2005-10-17 07:14:54 UTC
*** Bug 3724 has been marked as a duplicate of this bug. ***
Comment 6 Gunter Schmidt 2006-05-26 23:38:11 UTC
I have the same bug with V.1.6.5.

Try to upload any image with the name: Bug_1780_non_ascii_äöüß.png (hope you can read this on your system)

I tried to show you on mediawiki, but the bug is not there!
http://meta.wikimedia.org/wiki/Image:Bug_1780_non_ascii_%C3%A4%C3%B6%C3%BC%C3%9F.png

Maybe 1.7 works differently?
Comment 7 Brion Vibber 2006-05-26 23:49:46 UTC
That's because our site doesn't run on Windows servers.
Comment 8 Iaroslav Vassiliev 2007-01-27 17:15:50 UTC
The problem persists in MediaWiki 1.8 on Windows XP. Generally everything works
fine on Windows, except this bug, that is very disturbing. Is it possible to do
something around it, or is it a fatal incompatibility forever?
Comment 9 Iaroslav Vassiliev 2007-05-29 00:49:16 UTC
I've got a temporary solution (at least, for my MediaWiki 1.8.2 on Windows XP), though it is far from perfection and involves iconv function.

Firstly,
In SpecialUpload.php file, in processUpload() function, right before closing the last "if( $this->saveUploadedFile(..." block, update the source code as follows:

...
  } else {
    $wgOut->showFileNotFoundError( $this->mUploadSaveName );
  }
  rename( $this->mSavedFile, iconv ('UTF-8', 'CP1251', $this->mSavedFile) ); # NEW	}
...

Secondly,
In Image.php file, in reallyRenderThumb() function, in the middle of "elseif ( $wgUseImageMagick ) {..." block, update the source code as follows:

...
wfDebug("reallyRenderThumb: running ImageMagick: $cmd\n");
if (file_exists(iconv('UTF-8', 'CP1251', $thumbPath)) == false)	# NEW
  rmdir( substr_replace($thumbPath, '', strrpos($thumbPath, "/")));	# NEW
mkdir( substr_replace( iconv('UTF-8', 'CP1251', $thumbPath), '',	# NEW
  strrpos(iconv('UTF-8', 'CP1251', $thumbPath), "/")));	# NEW
$cmd = iconv ('UTF-8', 'CP1251', $cmd);	# NEW
wfProfileIn( 'convert' );
...

If you use something other than ImageMagick for image processing, you should transfer the second code fragment to appropriate block and adapt it to that program, if required.

IMPORTANT: If your Windows uses some other code page than Windows-1251, than in code above you should change 'CP1251' to your code page identifier. And DO NOT use this code on non-Windows machines.
Comment 10 Brion Vibber 2007-10-27 21:06:47 UTC
*** Bug 11758 has been marked as a duplicate of this bug. ***
Comment 11 Mormegil 2008-03-20 16:31:08 UTC
Created attachment 4734 [details]
A basic configurable workaround for this bug

The patch adds a global configuration variable $wgLocalFilesystemCharsetOverride that can be set to the charset of the local file system (e.g. 'CP1250'), and all names of the uploaded files are converted to this charset (using iconv) when talking with the filesystem. However, this works correctly only when the destination filename contains only characters from this charset, so this is not a perfect solution.

But the support for file uploads on Windows (and other OSes) is limited in many other ways (there is no filename syntax checking other than stripping path components, which is far from being sufficient on Windows), anyway.

The correct solution to this might depend on the mysterious image backend rewrite. ;-)
Comment 12 Brion Vibber 2008-03-20 21:00:39 UTC
Yeah, this would still break with other chars, or if iconv() isn't present... the generated URLs might be wrong, too; depends what charset the web server is going to be expecting!
Comment 13 Max Semenik 2008-07-25 18:30:07 UTC
*** Bug 14924 has been marked as a duplicate of this bug. ***
Comment 14 D J Bauch 2008-07-25 20:24:10 UTC
(In reply to comment #13)
> *** Bug 14924 has been marked as a duplicate of this bug. ***
> 
Thanks for redirecting me from bug 14924. The patch attachment for this bug, with code page set to CP1250 in LocalSettings.php seems to fix most of the problems I've been seeing with images on IIS6/SQL Server/Windows 2003/Mediawiki 1.13 -- including the one I identified in my bug submission and several others, such as the recent POTD Image:CT of brain of Mikael Häggström large.png and Image:Bandeira do Município do Rio de Janeiro.png. It does not, however fix all of them. For example:
Image:Ostredok, Veľká Fatra (SVK) - NW slope.jpg (http:.../index.php?title=Image:Ostredok%2C_Ve%C4%BEk%C3%A1_Fatra_%28SVK%29_-_NW_slope.jpg) image still does not show up.
Image:Hors d'œuvre (Bosnian).jpg (Image:Hors_d%27%C5%93uvre_%28Bosnian%29.jpg) causes iconv to complain [function.iconv]: Detected an illegal character in input string in W:\Inetpub\wwwroot\mediawiki\includes\filerepo\File.php on line 68
Comment 15 Brion Vibber 2008-10-06 17:58:31 UTC
*** Bug 15863 has been marked as a duplicate of this bug. ***
Comment 16 Brion Vibber 2008-10-06 18:12:08 UTC
DJ, CP1250 is for Central Europe and doesn't include the "œ" character, hence the failure.

"Ostredok, Veľká Fatra (SVK) - NW slope.jpg" presumably ought to work, but it's hard to debug without an instance to check... However...

My suspicions:

1) It's possibly safest to just create UTF-8 URLs -- that is, don't try to encode the generated URLs to the locale charset. IIS is probably smart enough to detect UTF-8 and load the files correctly (the filesystem stores filenames as UTF-16 Unicode.)

2) Suddenly I'm not sure whether you actually want the "ANSI" codepage or the "OEM" codepage for filesystem storage. *shudder*

Ugh.

The best thing would probably just be to have a switch to encode filenames in some nice ASCII-safe hex encoding, rather than mess around with charsets.
Comment 17 Fran Rogers 2008-10-06 21:25:46 UTC
The problem is in PHP's handling, or lack thereof, of Unicode. NTFS uses UTF-16 internally, as Brion pointed out; the problem is that the Win32 API provides separate wchar_t oriented versions of stdio functions (like _wfopen()) for working with Unicode filenames, while the traditional char versions (like fopen()) translate the current legacy 8-bit code page into the corresponding Unicode representation for backwards compatibility. Unfortunately, PHP's innards are completely eight-bit, and has no knowledge of wchar_t stdio, so it's limited to characters in the current code page. :/ Using setlocale() to change the code page to UTF-8 might work, but setlocale() looks very brittle and ugly.

Indeed, mangling Unicode characters to ASCII in a predictable way is probably the best/only way to work around it.
Comment 18 Brion Vibber 2008-10-06 21:37:53 UTC
dumpHtml uses a fun hack that shells out to a VBScript to rename files to a Unicode destination... That's probably not the nicest way to do it in active use. ;)

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/DumpHTML/rename-hack.vbs?view=markup

Even if we used such a hack to *create* files, we couldn't *manipulate* them again without doing really weird crap like looking up the 8.3 version of the file path. So ASCII mangling is definitely going to be the safest thing.
Comment 19 D J Bauch 2008-10-06 22:03:38 UTC
Brion, et. al.,
Thanks for your attention. I'm hoping that the official mechanism does change to one that's more compatible with Windows. In the mean time, I've switched from CP1250 to 'ISO-8859-1//TRANSLIT' as the character set that gives me the best results. Most images work now, but not all. This also doesn't fix problems with filenames that have '%' in the name. Sometimes that appears to be used to indicate the degree of transparency of some icons on Wikipedia, and I've had no luck getting those to display.
Comment 20 Chad H. 2010-04-02 14:58:46 UTC
*** Bug 23028 has been marked as a duplicate of this bug. ***
Comment 21 Derk-Jan Hartman 2010-11-09 20:23:17 UTC
Apparently PHP 6 will have full unicode support:

http://bugs.php.net/bug.php?id=46990

I can't believe that something like PHP still has bugs like this. I ran into it today trying to help a user understand why his images were not working, and first we suspected it was just instantcommons, but eventually tracked it down to this issue.
Comment 22 Chad H. 2010-12-14 13:53:34 UTC
PHP6 is dead, so who knows when this will be fixed.

In the meantime, I'd suggest adding a warning to Special:Upload when wfIsWindows() and you try to upload a file with unicode in the name.
Comment 23 Bryan Tong Minh 2011-05-15 12:38:07 UTC
I forbid uploading non-ascii files on Windows in r88165.
Comment 24 Paolo Benvenuto 2011-05-15 17:07:05 UTC
Well, this isn't a fix, it's a limitation...
Comment 25 Bryan Tong Minh 2011-05-15 17:56:41 UTC
It's a fix in the sense that it is no longer possible to upload a file which then can't be viewed anymore. A proper fix would be to make PHP use wide character functions.
Comment 26 Brion Vibber 2011-08-15 20:13:43 UTC
Reopening -- doesn't seem to fix it, just makes some of your pages platform-dependent.
Comment 27 Bryan Tong Minh 2011-08-16 07:32:05 UTC
(In reply to comment #26)
> Reopening -- doesn't seem to fix it, just makes some of your pages
> platform-dependent.

A way to fix this would be to make filenames on disk no longer map to titles. We have a bug open for that somewhere.
Comment 28 Sumana Harihareswara 2011-11-10 03:06:35 UTC
Thank you for the patch, Mormegil.
(In reply to comment #12)
Adding the "reviewed" keyword.  Also adding the internationalization keyword so the internationalisation/localisation team knows to look at this bug.
Comment 29 Chad H. 2011-11-10 03:18:39 UTC
Not really an i18n bug, it's an issue with filerepo.
Comment 30 Bryan Tong Minh 2013-01-01 21:34:56 UTC
This is now finally fixable with the filebackend!

I'm thinking about writing a custom backend which implements [[quoted-printable]] encoding. Any opinions on the encoding to use? It's a pity that the filebackend implements a listFiles method, otherwise we could have simply used a one-way hashing function.
Comment 31 Aaron Schulz 2013-02-02 00:24:11 UTC
(In reply to comment #30)
> This is now finally fixable with the filebackend!
> 
> I'm thinking about writing a custom backend which implements
> [[quoted-printable]] encoding. Any opinions on the encoding to use? It's a
> pity
> that the filebackend implements a listFiles method, otherwise we could have
> simply used a one-way hashing function.

Why not add that to FSFileBackend in the form of configurable escape/unescape functions? The default ones could just pass throw the raw input. One issue with any encoding scheme is handling URLs correctly, so users get file/thumbnail urls that actually are mapped to the encoded file names. I suppose a redirection module could be used. img_auth and thumb_handler would cover some of the obvious cases, though they don't handle RANGE requests. Another option would be a redirector module which would redirect requests to the encoded URL. CDN caching would be slightly trickier in any case.

It's hard to resist saying "just use Linux" though...

That said, it would be nice if FilRepo stored files based on hash and used a redirection or service layer to make readable URLs to files anyway. It would solve a lot of problems like weird race conditions, the poor performance and lack of atomicity for file moves/deletes/undeletes and re-uploads (especially for large files or if there are many versions), and issues like this bug as well (what characters a system allows). That's another story though...
Comment 32 orbartal 2013-04-24 11:27:28 UTC
How to fix the bug in Hebrew  (and in  any other language that windows support)

1.	In windows OS change the language for non-Unicode to your local MediaWiki language. E.g. the language of the files names you wish to upload. Usually it is the same as $wgLanguageCode language.  See how on this link. 
2.	Windows NTFS file system uses special encoding, not ascii or utf8. Check the appropriate encoding for your language. For Hebrew I used windows-1255.
3.	Edit the MediaWiki core code, and add these 4 changes. Note to use your language and not windows-1255. I used windows-1255 for Hebrew, but you might need something else. 
a.	Remove (or put as a comment) the test added by Bryan Tong Minh that prevent from uploading files with non ascii name in windows. Later we shell fix the bug, so that filter is no longer required. 
See details: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/88165 
MediaWiki/includes/upload/UploadBase.php line 756. 
b.	Go to the source code file in 
MediaWiki/includes/filebackend/ FSFileBackend.php. And in class FileBackendStore, in function FileBackendStore :: doStoreInternal in line 206, add the following lines:

if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') 
{
$charSetArr = array("ASCII", "JIS", "EUC-JP", "UTF-8", "UTF-16","windows-1251", 
"ISO-8859-1", "GBK");
		if (mb_detect_encoding($dest, $charSetArr) =="UTF-8")
		{
				$dest = iconv("UTF-8", "windows-1255",  $dest);
		}	
	}
Just before the command that copies the file to the path:
$ok = copy( $params['src'], $dest );

Now you can upload files and images in Hebrew. But you can’t view them as thumbnail. Two more similar code fix are required for this task to complete. 

c.	Go to the source code file in MediaWiki\includes\filerepo\file\File.php. And in class File, in function File:: transform in line 623, add the following lines:
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') 
{
$charSetArr = array("ASCII", "JIS", "EUC-JP", "UTF-8", "UTF-16","windows-1251",
 "ISO-8859-1", "GBK");
if (mb_detect_encoding($thumbPath, $charSetArr) =="UTF-8")
{
		$thumbPath = iconv("UTF-8", "windows-1255",  $thumbPath);
}	
}
Right after the command returns the full path to the folder of the thumbnail file: 
$thumbPath = $this->getThumbPath( $thumbName ); // final thumb path
d.	Go to the source code file in  MediaWiki\includes\media\Bitmap.php. And in class BitmapHandler, in function BitmapHandler::transformGd in line 548, add the following lines:
if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') 
{
$charSetArr = array("ASCII", "JIS", "EUC-JP", "UTF-8", "UTF-16","windows-1251", 
"ISO-8859-1", "GBK");
if (mb_detect_encoding($params['srcPath'], $charSetArr) =="UTF-8")
{
			$params['srcPath'] = iconv("UTF-8", "windows-1255",  $params['srcPath']);
}	
}
Right before the command that test if the file exists in that location: 
 if ( !file_exists( $params['srcPath'] ) )
Comment 33 orbartal 2013-04-24 11:34:09 UTC
Created attachment 12167 [details]
How to upload file with non-ASCII name on Windows host

How to enable upload file with non-ASCII name on Windows host with just 3 simple changes to the wiki server.
Comment 34 orbartal 2013-04-24 11:36:13 UTC
Created attachment 12168 [details]
enable upload file with non-ASCII name on Windows host

How to enable upload file with non-ASCII name on Windows host with just 3 simple changes to the wiki server.
Comment 35 Bryan Tong Minh 2013-09-09 21:38:42 UTC
(In reply to comment #31)
> (In reply to comment #30)
> > This is now finally fixable with the filebackend!
> > 
> > I'm thinking about writing a custom backend which implements
> > [[quoted-printable]] encoding. Any opinions on the encoding to use? It's a
> > pity
> > that the filebackend implements a listFiles method, otherwise we could have
> > simply used a one-way hashing function.
> 
> Why not add that to FSFileBackend in the form of configurable escape/unescape
> functions? The default ones could just pass throw the raw input. One issue
> with
> any encoding scheme is handling URLs correctly, so users get file/thumbnail
> urls that actually are mapped to the encoded file names. I suppose a
> redirection module could be used. img_auth and thumb_handler would cover some
> of the obvious cases, though they don't handle RANGE requests. Another option
> would be a redirector module which would redirect requests to the encoded
> URL.
> CDN caching would be slightly trickier in any case.
> 
> It's hard to resist saying "just use Linux" though...
> 
> That said, it would be nice if FilRepo stored files based on hash and used a
> redirection or service layer to make readable URLs to files anyway. It would
> solve a lot of problems like weird race conditions, the poor performance and
> lack of atomicity for file moves/deletes/undeletes and re-uploads (especially
> for large files or if there are many versions), and issues like this bug as
> well (what characters a system allows). That's another story though...

I would not add a complicated redirector, but just modify File::getUrl() to apply the encoding. I can't really find out though if there currently is any interaction between filerepo and filebackend regarding the file url.
Comment 36 Brion Vibber 2013-09-12 17:53:35 UTC
So I found an old upstream bug from 2005 on the low-level API problem here:
https://bugs.php.net/bug.php?id=33350

Added a comment that this is still a live issue. :)
Comment 37 Bryan Tong Minh 2013-09-12 18:05:28 UTC
Alternatively to hacking filebackend, we could wrap the FileSystemObject using PHPs COM extension. If somebody really wants to put effort into this ;)
Comment 38 Gerrit Notification Bot 2014-04-12 22:28:07 UTC
Change 125573 had a related patch set uploaded by Aaron Schulz:
[WIP] Added path encoding to FileBackendStore for Windows support

https://gerrit.wikimedia.org/r/125573
Comment 39 Gerrit Notification Bot 2014-05-08 20:59:53 UTC
Change 132298 had a related patch set uploaded by Aaron Schulz:
Added better path encoding to FileBackend for Windows

https://gerrit.wikimedia.org/r/132298
Comment 40 Gerrit Notification Bot 2014-05-08 21:01:07 UTC
Change 125573 abandoned by Aaron Schulz:
Added path encoding to FileBackendStore for Windows support

Reason:
Mostly not needed since given the SHA1 storage name patch, which also handles the same problem and more

https://gerrit.wikimedia.org/r/125573
Comment 41 Bawolff (Brian Wolff) 2014-07-19 15:18:08 UTC
*** Bug 68268 has been marked as a duplicate of this bug. ***
Comment 42 dgiim 2014-10-05 12:26:11 UTC
I am using mediawiki in Korean environment. 

When will completely fix this? 

I have resolved to hack Upload problem. 

But I can not see the thumbnail. 

Help me.
Comment 43 orbartal 2014-10-05 16:39:15 UTC
Try using the in the pdf file: "How to fix the bug in Hebrew". It works for all languages, not just for Hebrew. And it fixes the thumbnail bug as well. Tell me if it works. And if it’s not, I will try to help you solved it.
Comment 44 dgiim 2014-10-06 05:11:25 UTC
First of all, thank you give a quick get attention. orbartal. 



I've had to change a thumbnail below to display the file.php. 

... 
$ thumbPath = $ this-> getThumbPath ($ thumbName); // Final thumb path 
// CP949 is a windows charset system for hangul, a korean character. 
$ thumbPath = iconv ("UTF-8", "CP949", $ thumbPath); 
... 


Also, I've had to change as follows bitmap.php. 

... 
$ params ['srcPath'] = iconv ("UTF-8", "CP949", $ params ['srcPath']); 
if (! file_exists ($ params ['srcPath'])) {
... 


Currently, it is well Hangul file upload. However, no thumbnail is displayed. Instead, in the following locations, are displayed in the thumbnail spot an error: 'filemissing' 


Please help me! 


[More] 

- MediaWiki Version: 1.23.3 
- System: Windows 7 (hangul) 



Thank you.
Comment 45 John Mark Vandenberg 2014-10-07 03:29:14 UTC
(In reply to Gerrit Notification Bot from comment #40)
> Change 125573 abandoned by Aaron Schulz:
> Added path encoding to FileBackendStore for Windows support
> 
> Reason:
> Mostly not needed since given the SHA1 storage name patch, which also
> handles the same problem and more
> 
> https://gerrit.wikimedia.org/r/125573

That patch has been abandoned, but I have asked on the changeset whether the patch might still be useful for older versions of MediaWiki which have this bug.
Comment 46 Gerrit Notification Bot 2014-10-07 17:12:32 UTC
Change 125573 restored by Aaron Schulz:
Added path encoding to FileBackendStore for Windows support

Reason:
Rebasing (then closing again)

https://gerrit.wikimedia.org/r/125573
Comment 47 Gerrit Notification Bot 2014-10-07 17:48:49 UTC
Change 125573 abandoned by Aaron Schulz:
Added path encoding to FileBackendStore for Windows support

https://gerrit.wikimedia.org/r/125573

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links