Last modified: 2011-04-30 01:16:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T25642, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 23642 - Support proper mime type detection of Office Open XML
Support proper mime type detection of Office Open XML
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.14.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-24 09:10 UTC by Anon Sricharoenchai
Modified: 2011-04-30 01:16 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Detect mime types for openxml files (4.46 KB, patch)
2010-06-19 01:47 UTC, Derk-Jan Hartman
Details

Description Anon Sricharoenchai 2010-05-24 09:10:28 UTC
According to, includes/MimeMagic.php

                // Check for ZIP (before getimagesize)
                if ( strpos( $tail, "PK\x05\x06" ) !== false ) {
                        wfDebug( __METHOD__.": ZIP header present at end of $file\n" );
                        return $this->detectZipType( $head );
                } 

Some xls (ms excel) files contain "PK\x05\x06", so it incorrectly detect as zip file and not pass filetype checking.


Here is the debugging message shown by $wgDebugComments=true,

   mime: <application/zip> extension: <xls>
   
   UploadForm::verifyExtension: mime type application/zip mismatches file extension xls, rejecting file
Comment 1 Anon Sricharoenchai 2010-05-24 09:15:07 UTC
MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/php4korsl
MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart|chart-template|formula|formula-template|graphics|graphics-template|image|image-template|presentation|presentation-template|spreadsheet|spreadsheet-template|text|text-template|text-master|text-web))/
MimeMagic::detectZipType: unable to identify type of ZIP archive
MimeMagic::guessMimeType: final mime type of /tmp/php4korsl: application/zip
MediaHandler::getHandler: no handler found for application/zip.
File::getPropsFromPath: /tmp/php4korsl loaded, 43008 bytes, application/zip.
MacBinary::loadHeader: header bytes 0 and 74 not null
MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/php4korsl
MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart|chart-template|formula|formula-template|graphics|graphics-template|image|image-template|presentation|presentation-template|spreadsheet|spreadsheet-template|text|text-template|text-master|text-web))/
MimeMagic::detectZipType: unable to identify type of ZIP archive
MimeMagic::guessMimeType: final mime type of /tmp/php4korsl: application/zip


mime: <application/zip> extension: <xls>

UploadForm::verifyExtension: mime type application/zip mismatches file extension xls, rejecting file
Comment 2 Bawolff (Brian Wolff) 2010-05-24 09:20:20 UTC
Are you sure it isn't zip? From what I understand, excel 2007's default format (.xlsx, but I don't think excel cares what the extension is) is actually a zip archive.
Comment 3 Sam Reed (reedy) 2010-05-24 23:37:11 UTC
Indeed, they are

And Version 1.14?
Comment 4 Anon Sricharoenchai 2010-05-25 03:05:10 UTC
I'm not sure what format, but I can unzip it.

1. The xml file inside the zip contains xmlns="http://schemas.openxmlformats.org/package/2006/content-types"
2. The file can be opened by openoffice.org 2.4
Comment 5 Anon Sricharoenchai 2010-05-25 03:10:34 UTC
3. When unzip (in gnu/linux), it says,

   warning [filename.xls]:  22287 extra bytes at beginning or within zipfile
Comment 6 Anon Sricharoenchai 2010-05-25 03:19:11 UTC
4. The file can be opened by openoffice.org 2.0 (ubuntu 6.06)
Comment 7 Sam Reed (reedy) 2010-05-25 10:38:01 UTC
Can this be duplicated on 1.15/1.16?

If not, it can be closed
Comment 8 Anon Sricharoenchai 2010-05-27 03:30:21 UTC
It think it can be duplicated on the latest version, since the logic in MimeMagic::doGuessMimeType() look the same (when comparing 1.14 to svn).
Comment 9 Bawolff (Brian Wolff) 2010-05-27 07:25:33 UTC
Acording to comments in the code, MimeMagic::detectZipType only supports OpenDocument files, so its not surprising that Office Open XML can't be detected. (as a sidenote: why must all these formats be named so similarly?).
Comment 10 Derk-Jan Hartman 2010-06-01 20:45:24 UTC
This is a difficult one to work around. I guess we could scan for [Content_Types].xml in the file, which would identify it as an Open Packaging Conventions ZIP file. The only way to then set the mime type correctly is by relying on the filename extension as far as I can see, because it is just a zip archive and can contain anything basically.

I'm not sure if [Content_Types].xml is always file1 in the zip however, so we may have to read the entire directory listing of the zip archive....
Comment 11 Eggertsen 2010-06-15 18:23:15 UTC
Added myself to CC list.
Comment 12 Bryan Tong Minh 2010-06-18 15:09:41 UTC
Tweaking summary accordingly
Comment 13 Derk-Jan Hartman 2010-06-19 01:47:49 UTC
Created attachment 7484 [details]
Detect mime types for openxml files

This is my idea for fixing this problem.

I introduce a new mime type. This is application/x-opc+zip
This mimetype basically means "Open Packaging Conventions" archive and is a private mimetype that I came up with.

When initially checking for mimetype, we detect that this is an OPC file, and we use the extension to guess what type of OPC file. Then on the verify pass (where guessing based on extension is not allowed), we detect that the file is an OPC archive. We then check if the file extension (docx for instance) is an allowed file extension for this filetype, and we check if opc files are on the mime blacklist.

File entries are stored into the database with their 'proper' MS Office mimetype. Normally the OPC filetype should not ever be served, unless people disable mimeverification.
Comment 14 Derk-Jan Hartman 2010-06-19 18:49:47 UTC
Done in r68279

Note that as with any zip files, if you allow these files on your server, you potentially allow GIFAR  like attacks on clients who do not have up to date JVMs.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links