Last modified: 2011-04-30 01:16:45 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T22924, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 20924 - binary files are incorrectly detected as application/zip
binary files are incorrectly detected as application/zip
Status: RESOLVED INVALID
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.14.x
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-01 15:17 UTC by Nick B
Modified: 2011-04-30 01:16 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
test file for bug reproduction (21.50 KB, application/msword)
2009-10-01 15:17 UTC, Nick B
Details

Description Nick B 2009-10-01 15:17:09 UTC
I have a downstream bug that I've been working on for an MW 1.14 installation. Certain users can't upload most of their MS Word documents. These are NOT the .docx XML-based formats but the older type. It turns out that users would receive the error even with a BLANK document saved. I've attached an example of this for people to reproduce the bug. The error received is the message:

"The file is corrupt or has an incorrect extension. Please check the file and upload again."



Debugging locally has shown that these files are being identified as:

mime: <application/zip> extension: <doc

Clearly with MIME-type/extension verification turned on this will fail giving the error they see. The question is why is this being found as "application/zip"?

(Furthermore, the workaround I read about / had planned, using $wgMimeDetectorCommand to externally check the MIME-type, is no good as MimeMagic::guessMimeType() calls doGuessMimeType FIRST, and any (false) positives will then NOT call detectMimeType(), which seems to work correctly)
Comment 1 Nick B 2009-10-01 15:17:38 UTC
Created attachment 6610 [details]
test file for bug reproduction
Comment 2 Nick B 2009-10-01 16:05:51 UTC
What I didn't add is that I've identified what I believe is the problem, stemming from revision 39203:
r1=39203&r2=39202&pathrev=39203">http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/MimeMagic.php?r1=39203&r2=39202&pathrev=39203

By reading the last ~64k of a file, it seems that the magic word for a valid (empty) ZIP file can also be found in certain mac word documents. I've verified this on the test document with bgrep (http://blog.thorx.net/2009/07/binary-grep/).


(Also, this might not be the first time this bug has been noticed:
http://www.mwusers.com/forums/showthread.php?t=4903 )
Comment 3 Carl Johnstone 2009-11-19 14:59:06 UTC
I've had this with a Windows Word document, unfortunately it also happened to be the first file I tried to upload to a newly installed wiki.

In theory there's approximately a 1 in 65000 chance that the check will match any 64k of binary data and would equally apply to any file type including images.

Ideally the detection needs to be improved, failing that a clearer error message to the user would help.
Comment 4 Ilmari Karonen 2009-12-06 19:07:39 UTC
As I noted at bug 16583, this issue has security implications: the code in MimeMagic.php that triggers these occasional false positives is also what's protecting MediaWiki from things like the GIFAR exploit (a file which is simultaneously a valid GIF image and an executable Java archive).

That said, the error reporting could be cleaner: in particular, rather than detecting these files as application/zip, we should ideally first run them through normal MIME type detection and only then check for any unexpected ZIP EOCDR markers and, if any are found, fail with a message something like: "This file, apparently of type foo/bar, contains a marker suggesting it might also be a valid ZIP archive.  For security reasons, uploading such files has been disabled."

Also, it might be possible to reduce the false positive rate for the ZIP file detection, but doing so safely would have to involve checking how existing ZIP decoders (in particular, the Info-ZIP decoder and Java's java.util.zip classes) do it, lest we accidentally allow through files which, though not necessarily valid according to the ZIP format spec, might still be accepted by these decoders.
Comment 5 Platonides 2009-12-06 19:13:23 UTC
It's not matching it as a zip by pure chance. That file contains a zip-like structure embedded. For instance, 7-zip is able to "open" it, showing inside the files ObjectPool/, [5]SummaryInformation, WordDocument, [1]CompObj, 1Table and [5]DocumentSummaryInformation.
Comment 6 Platonides 2009-12-06 19:18:29 UTC
The list given by unzip -l seems more reliable:
warning:  6060 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  Length     EAs   ACLs    Date   Time    Name
 --------    ---   ----    ----   ----    ----
      540      0      0  01/01/80 00:00   [Content_Types].xml
      310      0      0  01/01/80 00:00   _rels/.rels
      138      0      0  01/01/80 00:00   theme/theme/themeManager.xml
     7559      0      0  01/01/80 00:00   theme/theme/theme1.xml
      283      0      0  01/01/80 00:00   theme/theme/_rels/themeManager.xml.rels
 --------  -----  -----                   -------
     8830      0      0                   5 files

Those xml contain openxml content.
Looks like Microsoft Office is lying when told to save in the old format, by still including data in openxml format.
Comment 7 Bryan Tong Minh 2010-04-11 22:07:15 UTC
Cf. comment #6 this is really a zip file, so closing as invalid, since rejecting those files is the actual purpose.

(As a side note, perhaps we could have something like a zip stripper?)
Comment 8 Platonides 2010-04-11 22:55:05 UTC
I don't like the idea of changing the user file, but it could perhaps be integrated into new-upload: "Your file seems rotated. Fix?", "Currently I cannot accept this file with zip data included. Strip?"
This case would (should?) be easy to strip. Another Office app also likes to embed its own format into pngs, which we could remove, and pngs with garbage appended are common, too.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links