Last modified: 2014-10-04 11:25:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27163, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 25163 - Improve user feedback in html detected upload error
Improve user feedback in html detected upload error
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
1.17.x
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on:
Blocks: messages
  Show dependency treegraph
 
Reported: 2010-09-13 15:59 UTC by Derk-Jan Hartman
Modified: 2014-10-04 11:25 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Derk-Jan Hartman 2010-09-13 15:59:22 UTC
Currently when you upload a jpg that has html in the exif tag you get a message alla: "This file contains HTML or script code which could be executed by a browser".

This is really not that helpful to the "normal" user. It doesn't say what the user has to do to rectify this problem if he wants to upload the file. I was thinking that perhaps this is a location where we could add a link to a Mediawiki help page or something. This page could then detail how to remove html code from an exif tag or from some other fileformats and give further hints on how to deal with the problem.
Comment 1 Sumana Harihareswara 2013-02-21 23:43:07 UTC
This is reasonably easy to do on a technical level.
Comment 2 Anu 2013-07-08 21:10:22 UTC
I want to work on this simple bug. I am slightly slow because I am working on this bug. Please bear with me.
Comment 3 Andre Klapper 2013-07-09 09:40:10 UTC
Does some page exist that explains how to remove HTML code from an exif tag? If not that's probably the first step, and then finding the exact location of the string by grep'ing the codebase.
Comment 4 Bawolff (Brian Wolff) 2013-07-09 14:41:34 UTC
(In reply to comment #3)
> Does some page exist that explains how to remove HTML code from an exif tag?

No. At best [[commons:Commons:Exif]] details how to edit the exif fields. Someone would have to make such a page on mediawiki.org

-----

Note: HTML in exif tags are just one cause of this issue. In theory it could be exif anywhere in the file. It could be a non-jpeg file (For example people sometimes get this error if they enable uploading of certain xml-based formats) etc.



(In reply to comment #2)
> I want to work on this simple bug. I am slightly slow because I am working on
> this bug. Please bear with me.

If you run into any trouble or need any advice, don't hesitate to ask.
Comment 5 Sumana Harihareswara 2013-07-15 22:40:32 UTC
Anu, how is it going? Are you having any trouble?
Comment 6 Anu 2013-07-16 16:35:54 UTC
@(In reply to comment #5)
> Anu, how is it going? Are you having any trouble?

@Sumana, Thanks for asking. I have a problem with the settings on my computer and is not related to the bug directly. Once I can set that right, I can start working on this bug.
Comment 7 Sumana Harihareswara 2013-08-15 03:37:03 UTC
Anu, are you still having trouble with your computer settings?
Comment 8 RituS 2013-12-12 12:38:12 UTC
Hey! Could I work on this bug?? I am a newbie! Could somebody assign me the bug and tell me how to get started?
Comment 9 Andre Klapper 2013-12-12 13:24:25 UTC
RituS: See comment 3 & comment 4 for specific things to consider for this issue; for general info/help see https://www.mediawiki.org/wiki/Developer_access . If something with MediaWiki development is unclear *in general*, please check out https://www.mediawiki.org/wiki/MediaWiki_on_IRC or https://lists.wikimedia.org/mailman/listinfo .
Comment 10 Jackmcbarn 2014-09-30 15:18:04 UTC
A few questions: Is IE's autodetection really so bad that we need to reject valid binary images because they have something that looks like HTML in them? Also, how did HTML (or something that looks sort of like HTML) end up in an EXIF tag in the first place? If there's no non-contrived way, then is this worth worrying about? Finally, is the right way to "fix" this really to link users to a page telling them how to remove EXIF data?
Comment 11 Bawolff (Brian Wolff) 2014-09-30 15:43:28 UTC
(In reply to Jackmcbarn from comment #10)
> A few questions: Is IE's autodetection really so bad that we need to reject
> valid binary images because they have something that looks like HTML in
> them? Also, how did HTML (or something that looks sort of like HTML) end up
> in an EXIF tag in the first place? If there's no non-contrived way, then is
> this worth worrying about? Finally, is the right way to "fix" this really to
> link users to a page telling them how to remove EXIF data?

IE6 has shockingly bad content detection

Html ends up in exif tags mostly from people adding things like <a href=... to exif tags

One possible alternative fix (would need review safety of this) i think would be to have mw modify the file to add 255 bytes of padding (jpg allows padding markers in file immediately after the first marker. The other case this issue happens is certain xml formats, which allow whitespace padding).
Comment 12 Roxana Valentina 2014-09-30 17:42:33 UTC
I started working on this bug.

My current fix consists in adding a link (that redirects the user to a page that has information regarding to how to delete the 'bad' tag (still working on the page, I somehow added bad advice and inaccuracies)) to the error message received when uploading a file.

I was wondering if I should continue like this or change the fix with the padding solution (even though the user can also upload non-jpeg files).
Comment 13 Bawolff (Brian Wolff) 2014-09-30 21:17:08 UTC
> 
> I was wondering if I should continue like this or change the fix with the
> padding solution (even though the user can also upload non-jpeg files).

The padding thing is much much much more complicated (bug 25707), and also requires review of its soundness by somebody who knows the ins and outs of the IE content detection. I'd recommend just working on changing the message, at least for now (If you're interested, you can of course work on the padding thing, but its a much bigger job then what one would normally want to take on for a first bug to fix).
Comment 14 Gerrit Notification Bot 2014-10-04 11:18:22 UTC
Change 164726 had a related patch set uploaded by Tuxilina:
Improved user feedback in html detected upload error

https://gerrit.wikimedia.org/r/164726
Comment 15 Roxana Valentina 2014-10-04 11:25:20 UTC
I changed the message and I added a link to https://www.mediawiki.org/wiki/Remove_Exif_tags . Now I just need some help to make this page accurate, and not full of bad advice. Should I leave this help page blank?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links