Last modified: 2014-10-02 08:41:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27707, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 25707 - Allow "html" in exif tags
Allow "html" in exif tags
Status: NEW
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: exif
  Show dependency treegraph
 
Reported: 2010-10-29 16:26 UTC by DieBuche
Modified: 2014-10-02 08:41 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description DieBuche 2010-10-29 16:26:26 UTC
Every once in a while, photographer put something like this into their exif fields "<a href=example.com>Photo by me</a>"
This won't work for obvious reasons, but still prevents users to upload the file "for security reasons".
Comment 1 Bryan Tong Minh 2010-10-29 16:27:57 UTC
The security reason is that IE may get fooled into thinking that this is actually an HTML file and try to display it, executing any embedded JS in the process.
Comment 2 Bawolff (Brian Wolff) 2010-10-29 23:53:20 UTC
Theoretically we could perhaps strip the html tags in exif fields. That would require a general means of editing exif tags. However we'll probably eventually need that anyways if we plan to fix Bug 20326.
Comment 3 DieBuche 2010-10-30 00:09:52 UTC
Oh, sorry, maybe I wasn't clear enough. I'm aware of the script issue, but would it still be a concern if we only disallowed <script> and <iframe> tags, and let <a> or <img> pass?
Comment 4 Derk-Jan Hartman 2010-10-30 00:22:25 UTC
That would mean tying an html parser into the filetype detection system.

I guess in theory it could be done with a whitelisting of several html tags, and stripping other tags as well as dangerous css from style=""... That's not a straightforward thing to implement though. Also, the solution would of course be far from HTML, and people would probably be asking for every single HTML tag they think might be useful to them :D
Comment 5 Bryan Tong Minh 2010-10-30 09:28:09 UTC
Well, only <script> and style= should be blacklisted right?
Comment 6 Derk-Jan Hartman 2010-10-30 10:22:41 UTC
how about IEs filter in style= ? And <link> elements of course, inline images, applet, iframe. There are many things in HTML that can potentially be dangerous.
Comment 7 Bawolff (Brian Wolff) 2010-10-30 17:10:19 UTC
Also if we were filtering html from the file, it'd be kind of weird to filter some html, then of the html we let in, not allow it to be used on the metadata box on the image page (With our current super-weird mix of first doing specialhtmlchars() on (most, not all of) the exif values, and then feeding the result of that into the parser.)
Comment 8 Roan Kattouw 2010-10-31 17:24:06 UTC
We don't arbitrarily filter some HTML. We have code to predict whether IE will think a file is HTML or not (based on Tim's reengineering of the IE MIME type detection code) and filter based on that.
Comment 9 Bawolff (Brian Wolff) 2010-11-08 05:44:34 UTC
Ok, so (From my understanding):
*IE only looks at the first 255 bytes of a file
*The EXIF standard allows arbitrary whitespace at the beginning of the exif application segment (right after the tiff header).

Proposed solution:
If we get a jpeg that fails the check, add about 255 bytes of whitespace, change the offsets for all the exif pointers, and see if it still fails the check. This of course would need to be tested to see if image viewers accept the arbitrary white space in practise and so on.
Comment 10 Bryan Tong Minh 2010-11-08 07:48:08 UTC
(In reply to comment #9)
> Proposed solution:
> If we get a jpeg that fails the check, add about 255 bytes of whitespace,
> change the offsets for all the exif pointers, and see if it still fails the
> check. This of course would need to be tested to see if image viewers accept
> the arbitrary white space in practise and so on.

Sounds reasonable to me, but this is a major change from how we have previously handled: previously your file after upload was more or less guaranteed to be exactly equal to that before upload. Now we are essentially losing the original file. I don't think we should care about this, but it is something to take in mind.

cc Tim Starling for security review of the proposed solution
Comment 11 Tisza Gergő 2014-10-02 08:41:58 UTC
Alternatively just disable logins in IE6 (and 7?). As long as the user can't log in, allowing arbitrary script execution on upload.wikimedia.org should be harmless. Disabling logins for old and insecure browsers was discussed in bug 56575.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links