Last modified: 2014-10-02 08:41:58 UTC
Every once in a while, photographer put something like this into their exif fields "<a href=example.com>Photo by me</a>" This won't work for obvious reasons, but still prevents users to upload the file "for security reasons".
The security reason is that IE may get fooled into thinking that this is actually an HTML file and try to display it, executing any embedded JS in the process.
Theoretically we could perhaps strip the html tags in exif fields. That would require a general means of editing exif tags. However we'll probably eventually need that anyways if we plan to fix Bug 20326.
Oh, sorry, maybe I wasn't clear enough. I'm aware of the script issue, but would it still be a concern if we only disallowed <script> and <iframe> tags, and let <a> or <img> pass?
That would mean tying an html parser into the filetype detection system. I guess in theory it could be done with a whitelisting of several html tags, and stripping other tags as well as dangerous css from style=""... That's not a straightforward thing to implement though. Also, the solution would of course be far from HTML, and people would probably be asking for every single HTML tag they think might be useful to them :D
Well, only <script> and style= should be blacklisted right?
how about IEs filter in style= ? And <link> elements of course, inline images, applet, iframe. There are many things in HTML that can potentially be dangerous.
Also if we were filtering html from the file, it'd be kind of weird to filter some html, then of the html we let in, not allow it to be used on the metadata box on the image page (With our current super-weird mix of first doing specialhtmlchars() on (most, not all of) the exif values, and then feeding the result of that into the parser.)
We don't arbitrarily filter some HTML. We have code to predict whether IE will think a file is HTML or not (based on Tim's reengineering of the IE MIME type detection code) and filter based on that.
Ok, so (From my understanding): *IE only looks at the first 255 bytes of a file *The EXIF standard allows arbitrary whitespace at the beginning of the exif application segment (right after the tiff header). Proposed solution: If we get a jpeg that fails the check, add about 255 bytes of whitespace, change the offsets for all the exif pointers, and see if it still fails the check. This of course would need to be tested to see if image viewers accept the arbitrary white space in practise and so on.
(In reply to comment #9) > Proposed solution: > If we get a jpeg that fails the check, add about 255 bytes of whitespace, > change the offsets for all the exif pointers, and see if it still fails the > check. This of course would need to be tested to see if image viewers accept > the arbitrary white space in practise and so on. Sounds reasonable to me, but this is a major change from how we have previously handled: previously your file after upload was more or less guaranteed to be exactly equal to that before upload. Now we are essentially losing the original file. I don't think we should care about this, but it is something to take in mind. cc Tim Starling for security review of the proposed solution
Alternatively just disable logins in IE6 (and 7?). As long as the user can't log in, allowing arbitrary script execution on upload.wikimedia.org should be harmless. Disabling logins for old and insecure browsers was discussed in bug 56575.