Last modified: 2008-10-20 18:01:16 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T12823, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 10823 - We do not reject mismatched file types on upload for many extensions. Ogg, pdf, mid, sx*, svg, xcf.


Summary:	We do not reject mismatched file types on upload for many extensions. Ogg, pd...

Status:	RESOLVED FIXED

Product:	Wikimedia
Classification:	Unclassified
Component:	wikibugs IRC bot (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low enhancement with 5 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-08-06 17:27 UTC by Gregory Maxwell
Modified:	2008-10-20 18:01 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Gregory Maxwell 2007-08-06 17:27:09 UTC

I'm opening this for tracking and comment collection purposes, it is not yet actionable.

A recent foundation-l threat linked to some scaremongering about viruses/trojans on websites open to outside submissions.  In this thread it was pointed out that the Wikimedia wikis are not sufficiently strict with uploaded content for some extensions.  The risk created by this is probably fairly low but it should be addressed.


For example, a win32 exe uploaded under a number of names:
http://commons.wikimedia.org/wiki/Image:Winecmdexe.ogg
http://commons.wikimedia.org/wiki/Image:Winecmdexe.pdf
http://commons.wikimedia.org/wiki/Image:Winecmdexe.sxw
http://commons.wikimedia.org/wiki/Image:Winecmdexe.mid
http://commons.wikimedia.org/wiki/Image:Winecmdexe.xcf
http://commons.wikimedia.org/wiki/Image:Winecmdexe.svg
http://commons.wikimedia.org/wiki/Image:Winecmdexe.sxd

As far as I can tell our current detection of Ogg and Midi files appear reliable and accurate. I don't see why we can't enforce for those types. Can anyone provide any counter examples or suggested test cases?

We do not appear to correctly detect valid XCFs.

I am not sure where we stand on the other formats.

Comment 1 Brion Vibber 2007-08-08 21:57:37 UTC

It _should_ be enforcing for recognized types, but that might only be for internally-recognized ones at the moment. We need to take a peek at that...


My strong recommendation is to rip out the whole MimeDetector stuff with its unreliable external dependencies and just do our own magic detection for all types. That should be enough for many of the basics -- keep out DOS/Windows executables, detect Ogg, PDF, SVG, etc nicely.

I'm not sure how easy it is to cleanly detect the various office formats. The StarOffice/OpenOffice/OpenDoc ones are ZIP-based, which may make it hard to tell legit docs from other ZIP files.

Comment 2 Platonides 2007-09-09 20:48:25 UTC

ZIP-based documents are unusual enough so accepting Zip files as them is not a great problem. We got much more articles uploaded as PDFs.

Note that we could easily differenciate them by reading the zip end instead of the first Kb and searching in the central directory for content.xml and mimetype. We can even differentiate the real file reading the mimetype content (which is not supposed to be compressed).

As MIME detection is quite good, eg. [[en:Bitmap file]]s with jpg extension get image/x-ms-bitmap mime, could a extension-mime association be enough?

Comment 3 Tim Starling 2007-09-09 20:52:20 UTC

I would suggest guessing the MIME type from the extension, then using the corresponding media handler to validate it. We could fall back to some generic magic number extraction method for types with no media handler. OggHandler for instance could do some fairly complex validation on uploaded Ogg files, but it needs an appropriate entry point.

Comment 4 Gregory Maxwell 2007-09-14 03:11:16 UTC

We also allow encrypted/DRMed PDFs right now.  These should be denied. People are using the PDF protection to bind advertisments into documents, I.e.:


[gmaxwell@cherenkov ~]$ pdfinfo /home/syncin/wikipedia/commons/a/a1/Latin_for_Beginners.pdf
Title:          Latin For Beginners
Subject:        Latin Grammar
Keywords:       www.textkit.com
Author:         Benjamin L. D'Ooge
Creator:        Acrobat 5.0 Image Conversion Plug-in for Windows
Producer:       Acrobat 5.0 Image Conversion Plug-in for Windows
CreationDate:   Tue Oct  1 10:32:17 2002
ModDate:        Mon May 24 10:26:33 2004
Tagged:         no
Pages:          358
Encrypted:      yes (print:yes copy:no change:no addNotes:no)
Page size:      612 x 792 pts (letter)
File size:      5839223 bytes
Optimized:      yes
PDF version:    1.5

Comment 5 Ruud Steltenpool 2007-10-03 14:03:04 UTC

Will you only allow plain valid SVG then?
There's lots of Inkscape(annotated) SVG on the web ...

Comment 6 Daniel Kinzler 2007-10-03 14:10:21 UTC

I very much agree with tim, although i would suggest to decouple handling of file formats from handling of media types (the same player may be used for different formats, for example). I wrote about that a while back, see http://brightbyte.de/page/Media_handlers and http://brightbyte.de/page/MediaWiki_media_dreams

This has been bugging me for quite some time... if only i wasn't committed to studying right now. I kind of feel the urge to write this :)

Comment 7 Gregory Maxwell 2007-10-03 14:12:03 UTC

> Will you only allow plain valid SVG then?
> There's lots of Inkscape(annotated) SVG on the web ...

Inkscape/Sodipodi SVG should certantly continue to be allowed, but that wouldn't stop us from validating that uploaded ".svg" files are valid XML which meet a number of basis tests of SVGness. 

There is a big difference between an extended dialect of SVGs and a windows .exe file renamed to .svg. :)

Comment 8 Platonides 2007-10-03 14:56:12 UTC

Just check that SVGs start with <?xml or <svg with an optional utf8 bom at the beginning. This handles 98% of uploaded svgs. It doesn't take into account UTF-16 svgs, (which we don't render) and would likely be the result of a broken editing, so the encoding parameter of the text declaration would also be wrong.

Comment 9 Ruud Steltenpool 2007-10-03 15:01:04 UTC

It's great to filter out .svg's renamed as .exe
It would be better even to make sure SVGs are valid (The W3C validator (that you can locally install) is very useful, though you might need to filter out foreign namespace stuff first to allow Inkscape/Sodipodi/other annotations)

Comment 10 Brion Vibber 2008-10-20 18:01:16 UTC

Ogg, PDF, MID, ODF, SVG, and XCF all have signature checks at present, as well as signature blacklists for EXE.

Old StarOffice/OpenOffice 1.x formats could be added, but should be considered deprecated at this point.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links