Last modified: 2008-10-20 18:01:16 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 10823 - We do not reject mismatched file types on upload for many extensions. Ogg, pdf, mid, sx*, svg, xcf.
We do not reject mismatched file types on upload for many extensions. Ogg, pd...
Product: Wikimedia
Classification: Unclassified
wikibugs IRC bot (Other open bugs)
All All
: Low enhancement with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2007-08-06 17:27 UTC by Gregory Maxwell
Modified: 2008-10-20 18:01 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Gregory Maxwell 2007-08-06 17:27:09 UTC
I'm opening this for tracking and comment collection purposes, it is not yet actionable.

A recent foundation-l threat linked to some scaremongering about viruses/trojans on websites open to outside submissions.  In this thread it was pointed out that the Wikimedia wikis are not sufficiently strict with uploaded content for some extensions.  The risk created by this is probably fairly low but it should be addressed.

For example, a win32 exe uploaded under a number of names:

As far as I can tell our current detection of Ogg and Midi files appear reliable and accurate. I don't see why we can't enforce for those types. Can anyone provide any counter examples or suggested test cases?

We do not appear to correctly detect valid XCFs.

I am not sure where we stand on the other formats.
Comment 1 Brion Vibber 2007-08-08 21:57:37 UTC
It _should_ be enforcing for recognized types, but that might only be for internally-recognized ones at the moment. We need to take a peek at that...

My strong recommendation is to rip out the whole MimeDetector stuff with its unreliable external dependencies and just do our own magic detection for all types. That should be enough for many of the basics -- keep out DOS/Windows executables, detect Ogg, PDF, SVG, etc nicely.

I'm not sure how easy it is to cleanly detect the various office formats. The StarOffice/OpenOffice/OpenDoc ones are ZIP-based, which may make it hard to tell legit docs from other ZIP files.
Comment 2 Platonides 2007-09-09 20:48:25 UTC
ZIP-based documents are unusual enough so accepting Zip files as them is not a great problem. We got much more articles uploaded as PDFs.

Note that we could easily differenciate them by reading the zip end instead of the first Kb and searching in the central directory for content.xml and mimetype. We can even differentiate the real file reading the mimetype content (which is not supposed to be compressed).

As MIME detection is quite good, eg. [[en:Bitmap file]]s with jpg extension get image/x-ms-bitmap mime, could a extension-mime association be enough?
Comment 3 Tim Starling 2007-09-09 20:52:20 UTC
I would suggest guessing the MIME type from the extension, then using the corresponding media handler to validate it. We could fall back to some generic magic number extraction method for types with no media handler. OggHandler for instance could do some fairly complex validation on uploaded Ogg files, but it needs an appropriate entry point.
Comment 4 Gregory Maxwell 2007-09-14 03:11:16 UTC
We also allow encrypted/DRMed PDFs right now.  These should be denied. People are using the PDF protection to bind advertisments into documents, I.e.:

[gmaxwell@cherenkov ~]$ pdfinfo /home/syncin/wikipedia/commons/a/a1/Latin_for_Beginners.pdf
Title:          Latin For Beginners
Subject:        Latin Grammar
Author:         Benjamin L. D'Ooge
Creator:        Acrobat 5.0 Image Conversion Plug-in for Windows
Producer:       Acrobat 5.0 Image Conversion Plug-in for Windows
CreationDate:   Tue Oct  1 10:32:17 2002
ModDate:        Mon May 24 10:26:33 2004
Tagged:         no
Pages:          358
Encrypted:      yes (print:yes copy:no change:no addNotes:no)
Page size:      612 x 792 pts (letter)
File size:      5839223 bytes
Optimized:      yes
PDF version:    1.5

Comment 5 Ruud Steltenpool 2007-10-03 14:03:04 UTC
Will you only allow plain valid SVG then?
There's lots of Inkscape(annotated) SVG on the web ...
Comment 6 Daniel Kinzler 2007-10-03 14:10:21 UTC
I very much agree with tim, although i would suggest to decouple handling of file formats from handling of media types (the same player may be used for different formats, for example). I wrote about that a while back, see and

This has been bugging me for quite some time... if only i wasn't committed to studying right now. I kind of feel the urge to write this :)
Comment 7 Gregory Maxwell 2007-10-03 14:12:03 UTC
> Will you only allow plain valid SVG then?
> There's lots of Inkscape(annotated) SVG on the web ...

Inkscape/Sodipodi SVG should certantly continue to be allowed, but that wouldn't stop us from validating that uploaded ".svg" files are valid XML which meet a number of basis tests of SVGness. 

There is a big difference between an extended dialect of SVGs and a windows .exe file renamed to .svg. :)
Comment 8 Platonides 2007-10-03 14:56:12 UTC
Just check that SVGs start with <?xml or <svg with an optional utf8 bom at the beginning. This handles 98% of uploaded svgs. It doesn't take into account UTF-16 svgs, (which we don't render) and would likely be the result of a broken editing, so the encoding parameter of the text declaration would also be wrong.
Comment 9 Ruud Steltenpool 2007-10-03 15:01:04 UTC
It's great to filter out .svg's renamed as .exe
It would be better even to make sure SVGs are valid (The W3C validator (that you can locally install) is very useful, though you might need to filter out foreign namespace stuff first to allow Inkscape/Sodipodi/other annotations)
Comment 10 Brion Vibber 2008-10-20 18:01:16 UTC
Ogg, PDF, MID, ODF, SVG, and XCF all have signature checks at present, as well as signature blacklists for EXE.

Old StarOffice/OpenOffice 1.x formats could be added, but should be considered deprecated at this point.

Note You need to log in before you can comment on or make changes to this bug.