Last modified: 2014-01-13 09:09:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17538, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15538 - Problem uploading compressed Dia file
Problem uploading compressed Dia file
Status: NEW
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.13.x
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-09 18:24 UTC by Filipe Brandenburger
Modified: 2014-01-13 09:09 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
I can not upload this file. (for example) (25.50 KB, application/docapplication_formul_ru.doc)
2014-01-10 08:44 UTC, Pavel (pastakhov)
Details

Description Filipe Brandenburger 2008-09-09 18:24:27 UTC
Hello,

I've been working with the Dia extension and uploading Dia files. When saving a file on Dia, you have a checkbox that asks you if you want to save a compressed file or not. If you do not save it compressed, it saves a XML file with xmlns:dia="http://www.lysator.liu.se/~alla/dia/". If you save it compressed, it will save the same XML file compressed with gzip. Both files receive the .dia extension, and Dia recognizes if the file is compressed internally.

This is the output of "file" for an uncompressed and a compressed file:

$ file *.dia
test-unc.dia:        XML document text
test-cmp.dia:        gzip compressed data, from Unix

Recently MediaWiki added support for recognizing Dia files. As I understand, if the file is XML, it will parse the file and look for the namespace(?), and it will recognize it as a Dia file if it finds this URL: http://www.lysator.liu.se/~alla/dia/.

The problem is that it does not recognize compressed Dia files. First, it will (expectedly) assign it the file the MIME type application/x-gzip, and then in "verifyExtension()" it will not match ".dia" to application/x-gzip, therefore stating that the file is corrupted.

I've been thinking about how to solve this problem. One way would be recognizing that the file is gzipped, then trying to look inside and, if the contents look like a XML, then do the logic to try to guess what the type is from the namespace of the XML. However, that seems to be too complex and too much overhead for this task.

I still think that, in that particular case, the extension is the easiest way to reliably recognize a Dia file.

So I was thinking about patching MediaWiki to include a new table (like MM_WELL_KNOWN_MIME_TYPES or MM_WELL_KNOWN_MIME_INFO) with information on how to override a MIME type based on the extension of the file. So, the entry for Dia on this table would be something like (not exact PHP syntax here, I'm not good in PHP):

    extension => ".dia",
    detected_mime => array('application/x-dia-diagram', 'application/xml', 'application/x-gzip' ),
    override_mime => 'application/x-dia-diagram'

What this means is, if on a file upload MediaWiki detects that the extension is ".dia" (or more generally, that the extension is in this table), it will check that the detected MIME type of the contents match one of the items of the array (in the case of Dia, it will be either a XML or a gzip compressed file), and if that is true, it will override the MIME to application/x-dia-diagram.

Now, I know that MediaWiki has tried to move away from detecting the type of the content based on the extension, but I really do not know what to do with Dia files. Of course I blame the problem on the Dia developers, after all they should probably not use a bare gzipped file and use somthing with a specific header instead, but now we already have a legacy of many Dia files and we will have to handle them in one or another way...

So, do you think my idea for a way to solve this is OK? If you do think so, I will work on a patch to do it and submit it to this bug.

Thanks!
Filipe
Comment 1 Brion Vibber 2008-09-09 18:30:24 UTC
Hmm, a good question. Support for gzipped SVG would be useful too, with similar pressures (though there a separate extension, .svgz is used -- see bug 4947).

It probably wouldn't be that hard to detect that the file looks like gzip and dive in for content checks with gzopen() etc. But we may then have to distinguish between xml types we know we can take gzipped and those we don't, so it could complicate things a little.
Comment 2 Filipe Brandenburger 2008-09-11 02:37:28 UTC
The problem of .svgz as I see it is that it requires browser and tool support for it to work, and probably needs Apache configuration in order to work properly.

The idea of using gzopen and parsing the XML inside is really good. I will try to come up with some code to do it for the specific case of Dia files.

I will try to do it configurable and extensible, then maybe someone will come up with a way to build support for .svgz from there on.

I should have a patch in some days, I'll post it here for your evaluation.
Comment 3 Pavel (pastakhov) 2014-01-10 05:02:12 UTC
This bug applies to all of the compressed files, including files ms office.
Comment 4 Gerrit Notification Bot 2014-01-10 05:20:29 UTC
Change 106657 had a related patch set uploaded by Pastakhov:
fix bug 15538

https://gerrit.wikimedia.org/r/106657
Comment 5 Pavel (pastakhov) 2014-01-10 08:42:34 UTC
I can not upload files doc and xls from MS Office to my wiki.
I get an error 'The file is a corrupt or otherwise unreadable ZIP file. It cannot be properly checked for security.'
Perhaps it is because of this same error.
Comment 6 Pavel (pastakhov) 2014-01-10 08:44:14 UTC
Created attachment 14278 [details]
I can not upload this file. (for example)
Comment 7 Andre Klapper 2014-01-10 13:25:12 UTC
Pavel: This bug is about uploading Dia files only. 
Please don't extend the scope of this bug report.

Please see https://www.mediawiki.org/wiki/Thread:Project:Support_desk/Some_Doc,_Xls_and_other_files_detected_as_zip_files , http://ryandlane.com/blog/2009/04/21/allowing-docpptxls-uploads-to-mediawiki-and-getting-proper-mime-types-back/ and feel free to ask on https://www.mediawiki.org/wiki/Project:Support_desk instead, if there are further questions. Thanks.
Comment 8 Gerrit Notification Bot 2014-01-13 09:09:13 UTC
Change 106657 abandoned by Pastakhov:
fix bug 15538

Reason:
Excuse me, it is very old patch and it is no suitable here, although the problem remained. I will describe it in bugzilla.
Thanks for links.

https://gerrit.wikimedia.org/r/106657

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links