Last modified: 2012-03-01 10:50:34 UTC
Trying to upload a .doc file generated with Microsoft Word 2007 results in: "The file is corrupt or has an incorrect extension" The Logfile: MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phpq31oe6 MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formu la|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|tex t-template|text-master|text-web|text))/ MimeMagic::detectZipType: unable to identify type of ZIP archive MimeMagic::guessMimeType: final mime type of /tmp/phpq31oe6: application/zip mime: <application/zip> extension: <doc> UploadForm::verifyExtension: mime type application/zip mismatches file extension doc, rejecting file This seems to be known, as http://www.mediawiki.org/wiki/Manual:$wgMimeDetectorCommand states "For example, 1.15.3 may misdetect .doc-files from MS Word 2007 as ZIP files", but I cannot find a corresponding bug. 23688, 23642, 18684 do not solve the problem.
Can you try the current 1.17alpha SVN version? cc. TheDJ
Behaviour does not change with 1.17alpha: FileCache negative MISS for Testbericht_V02.doc File::getPropsFromPath: Getting file info for /tmp/phpW9YCXV MimeMagic::__construct: loading mime types from /magwien/var/gondor-phpserver/html/wiki-ma48/includes/mime.types MimeMagic::__construct: loading mime info from /magwien/var/gondor-phpserver/html/wiki-ma48/includes/mime.info MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phpW9YCXV MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/ MimeMagic::detectZipType: unable to identify type of ZIP archive MimeMagic::guessMimeType: final mime type of /tmp/phpW9YCXV: application/zip MediaHandler::getHandler: no handler found for application/zip. File::getPropsFromPath: /tmp/phpW9YCXV loaded, 453632 bytes, application/zip. MacBinary::loadHeader: header bytes 0 and 74 not null MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phpW9YCXV MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/ MimeMagic::detectZipType: unable to identify type of ZIP archive MimeMagic::guessMimeType: final mime type of /tmp/phpW9YCXV: application/zip mime: <application/zip> extension: <doc> UploadForm::verifyExtension: mime type application/zip mismatches file extension doc, rejecting file
The extension for MS Office 2007 OpenXML documents is .docx not .doc For this to work: * rename the file to it's proper file extension * you have to have a 1.17 checkout * overwrite $wgMimeTypeBlacklist, so that application/x-opc+zip is not in the list * Add .docx to the list of allowed filetype extensions. $wgFileExtensions Although I have to say, that i'm expecting to see "detected an Open Packaging Conventions archive:" for these types of files in debug.
Created attachment 7497 [details] word 2007 testdocument
I did it exactly as you described, here the debug: File::getPropsFromPath: Getting file info for /tmp/phplRPqef MimeMagic::__construct: loading mime types from /magwien/var/gondor-phpserver/html/mwiki/includes/mime.types MimeMagic::__construct: loading mime info from /magwien/var/gondor-phpserver/html/mwiki/includes/mime.info MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phplRPqef MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/ MimeMagic::detectZipType: unable to identify type of ZIP archive MimeMagic::guessMimeType: final mime type of /tmp/phplRPqef: application/zip MediaHandler::getHandler: no handler found for application/zip. File::getPropsFromPath: /tmp/phplRPqef loaded, 453632 bytes, application/zip. MacBinary::loadHeader: header bytes 0 and 74 not null MimeMagic::doGuessMimeType: ZIP header present at end of /tmp/phplRPqef MimeMagic::detectZipType: /^mimetype(application\/vnd\.oasis\.opendocument\.(?:chart-template|chart|formula-template|formula|graphics-template|graphics|image-template|image|presentation-template|presentation|spreadsheet-template|spreadsheet|text-template|text-master|text-web|text))/ MimeMagic::detectZipType: unable to identify type of ZIP archive MimeMagic::guessMimeType: final mime type of /tmp/phplRPqef: application/zip mime: <application/zip> extension: <docx> UploadBase::verifyExtension: mime type application/zip mismatches file extension docx, rejecting file Perhaps you can take a look at the attached Testdocument, may be it is not in the format you expect.
(In reply to comment #3) > Although I have to say, that i'm expecting to see "detected an Open Packaging > Conventions archive:" for these types of files in debug. That'd be kinda hard to do since it's just a zip file, it'd have to look inside the file to determine if it's just a zip or if it's a 'special' zip. That just opens a whole 'nother can of worms.
Looking at this file, but it doesn't seem to be an openXML file to me. Will take some time to figure out what is going on. (zipped .doc perhaps ?)
Created attachment 7500 [details] Actual docx file Testbericht_V02.docx: Microsoft Office Document If you rename it to .doc it opens fine in word so I'm thinking it's a normal Word Document, resaved as Word Document in Word 2007 and now it identifies as Testbericht_V02.docx: Zip archive data, at least v2.0 to extract
I think that when saving in the old format, Word 2007 creates a kind of mixed format, by appending a zip structure to the .doc format. warning [Testbericht_V02.docx]: 430308 extra bytes at beginning or within zipfile Also see bug 23642 comment 5.
Platonides is right. Basically, 2007 saves a .doc file, but appends a .zip with OPC index to it. I'll add a check for this, by scanning for the magic bytes of older MS Office documents in some way. http://www.garykessler.net/library/file_sigs.html MSOffice header: D0 CF 11 E0 A1 B1 1A E1 Office subheaders at bytepos 512 EC A5 C1 00 [512 byte offset] DOC Word document subheader (MS Office) FD FF FF FF nn 00 00 00 [512 byte offset] PPT PowerPoint presentation subheader (MS Office) (where nn has been seen with values 0x0E, 0x1C, and 0x43) FD FF FF FF nn 00 [512 byte offset] or FD FF FF FF nn 02 [512 byte offset] XLS Excel spreadsheet subheader (MS Office) (where nn = 0x10, 0x1F, 0x22, 0x23, 0x28, or 0x29)
Should we really be doing this? we don't allow openoffice files which are also zips because of security vulnerabilities which would be a bit weird if we preferred Word over OO.
(In reply to comment #11) > Should we really be doing this? we don't allow openoffice files which are also > zips because of security vulnerabilities which would be a bit weird if we > preferred Word over OO. Users who wish to enable OpenXML files, should be able to do so, just like with OpenOffice now.
I got this working, but it is starting to become a bit of a mess. I'm considering introducing a new configuration variable to allow/disallow all zip types, because i already have: ODF, OpenXML, MS Office+OPC zip trailer and setting all that up will start to become more difficult for each and every zip type. With a seperate option, we could just remove the zip and the fake opc mime from the mimeblacklist and adding a seperate config option will make documenting and explaining the risks of zip based fileformats on open websites a lot easier I think.
$wgAllowZipFilesWhichCouldCompromiseMyUsers ? I'd like to have Special:Upload ask to remove the (apparently useless) zip trailer.
(In reply to comment #14) > $wgAllowZipFilesWhichCouldCompromiseMyUsers ? > > I'd like to have Special:Upload ask to remove the (apparently useless) zip > trailer. Would that not damage the files if people wanted to download and reopen them, some systems are very pedantic about the formatting of their files?
Microsoft seems to create different .doc formats (2003, 2003 from 2007). Should not simply be seen this as a Microsoft bug, and longer be a mediawiki issue ?
(In reply to comment #15) > (In reply to comment #14) > > $wgAllowZipFilesWhichCouldCompromiseMyUsers ? > > > > I'd like to have Special:Upload ask to remove the (apparently useless) zip > > trailer. > Would that not damage the files if people wanted to download and reopen them, > some systems are very pedantic about the formatting of their files? If I understand correctly, the OPC trailer stores information that can not be saved in the 2003 format. So it is a method of creating a 2003 compatible file that still has all the 2007 and later features of the original file when opened in 2007 or later. Actually kinda handy I have to say. but yes, the idea would be $wgAllowUploadsOfZipFilesBecauseItrustMyUploaders or something.
> (In reply to comment #14) > Would that not damage the files if people wanted to download and reopen them, > some systems are very pedantic about the formatting of their files? The newer Word still need to open pre-2007 files which don't have the trailer so no backwards compatibility issues there. The provided trialer contains a "font Theme". That won't be a fundamental feature in most cases but some users might need it. Note that while I support file stripping in certain cases, it should always happen with the uploader consent.
(In reply to comment #18) > Note that while I support file stripping in certain cases, it should always > happen with the uploader consent. And the user should have the possibility to upload the unstripped file (if allowed by the site administrator). A generic upload post processing API would be nice; other things like image rotation from EXIF info falls in that category as well.
Created attachment 7534 [details] gifar cleanup A patch of what I am proposing: 1: Move zip and virus checks before mime checks 2: ZIP gifar check is now separate from mime checks 3: Added $wgAllowGIFARVulnerableFiles global variable 4: Add zip mime detection support for openxml trailers on 2003 Office files. This will allow people to either choose to basically allow zip files uploads when they want. They would still need to whitelist filetypes, and in the case of actual zip files, they have to change the mime blacklist. But when setting $wgAllowGIFARVulnerableFiles=true and adding .doc .docx .odt to their whitelist, they will be able to upload such files none the less (and actual GIFAR files). We could consider expanding on this to add a "best-effort" mode to detectGIFAR(), where it will only allow opendocument/openxml files, and disallow the rest, though that is somewhat of a fake security model in my opinion.
Went with the original solution after all. Fixed in r68873
forgot to close the ticket
Can people there have a look at Bug 34797 - Cannot upload Office 97-2003 DOC and XLS files Seems a related issue :-) Thanks!