Last modified: 2014-10-26 17:12:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44813, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42813 - Descriptions and titles from Flickr should be unescaped
Descriptions and titles from Flickr should be unescaped
Status: REOPENED
Product: MediaWiki extensions
Classification: Unclassified
UploadWizard (Other open bugs)
master
All All
: Normal normal (vote)
: ---
Assigned To: Ryan Kaldari
:
Depends on:
Blocks: 43450
  Show dependency treegraph
 
Reported: 2012-12-07 04:10 UTC by Ryan Kaldari
Modified: 2014-10-26 17:12 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ryan Kaldari 2012-12-07 04:10:13 UTC
Right now special characters in Flickr descriptions are imported in their escaped format. In other words, you see things like " in the description field in UploadWizard. This isn't too bad since those get rendered as the correct characters on the File page, but it would be better to have the raw characters used.
Comment 1 Ryan Kaldari 2012-12-07 19:34:16 UTC
Unfortunately unescaping HTML entities in Javascript isn't easy without introducing an XSS vulnerability. We may just want to do some regexes for the most common HTML entities in the meantime.
Comment 2 Ryan Kaldari 2012-12-07 20:08:40 UTC
Looks like the only HTML special chars we actually have to worry about are " and &. Flickr leaves apostrophes unescaped and strips out tag characters.
Comment 3 Ryan Kaldari 2012-12-07 22:55:11 UTC
Fixed in https://gerrit.wikimedia.org/r/#/c/37492/
Comment 4 Lupo 2012-12-08 16:47:44 UTC
I don't think this is a good idea. Input

  "

I supposed to end up being displayed as

  "

With your change (or even my suggested inversal of the replacements, see my comment on the gerrit change), it'll end up being displayed as

  "

In Flinfo I've never found it necessary to do this. Flinfo does, however, replace <strong></strong> and <b></b> by ''' and <em> or <i> by '', and escapes the special characters {, }, | to &#x7B;, &#x7C;, &#x7D; to avoid that Flickr descriptions containing these can break the enclosing {{information}} template. Flinfo also transforms <a>-links to wikilinks.
Comment 5 Ryan Kaldari 2012-12-10 18:18:34 UTC
Why would we want &quot;? The idea of the patch is to convert &quot; to ", as explained in the initial bug comment. This bug is "Descriptions from Flickr should be UNescaped", not "Descriptions from Flickr should be escaped".

You should file a separate bug for the things that need to be escaped, such as {, }, |, etc.
Comment 6 Ryan Kaldari 2012-12-10 18:22:02 UTC
This bug may be a bit confusing since escaping is normally a good thing. The issue in this case, however, is we are putting the text in a textarea that the user can edit (not in HTML), so we actually need to UNescape certain characters (specifically &quot; and &amp;).
Comment 7 Ryan Kaldari 2012-12-10 18:26:22 UTC
Looking at your comment in the patch, I think I understand the confusion. You were thinking that this bug was about converting double-escaped strings into single-escaped strings, but it's actually about converting single-escaped strings into non-escaped strings. If we happen to convert double-escaped strings into non-escaped strings as well, that's OK.
Comment 8 Lupo 2012-12-10 21:20:57 UTC
I see, the double-decoding handles a Flickr bug.

BTW, you probably want to do that then for the title, too. See e.g. the output of this:

http://api.flickr.com/services/rest/?method=flickr.photos.getInfo&api_key=2beda6083bcdc84da6f8a5071abcda18&photo_id=423565224&format=json&nojsoncallback=1

(From their API explorer; non-free image, though.)

The photo should be entitled

  4" and counting

but Flickr returns

  4&amp;quot; and counting

as title.
Comment 9 Ryan Kaldari 2012-12-10 21:31:43 UTC
Ah, good catch. I forgot about the title!
Comment 10 Nischay Nahata 2012-12-12 14:36:37 UTC
If this is done specifically for flickr shouldn't this be done in mw.FlickrChecker.js ?
Comment 11 Ryan Kaldari 2012-12-12 18:26:38 UTC
I think it makes more sense to unescape right before the data is inserted into the text fields. Ideally, we don't want any escaped text in the description or title fields. Escaped text (and html) could theoretically come from any source. It just happens that Flickr is the only one we have to worry about right now.
Comment 12 Quim Gil 2013-11-19 15:30:39 UTC
Is this bug a good candidate for Google Code-in?
 
https://www.mediawiki.org/wiki/Google_Code-In
Comment 13 Andre Klapper 2014-10-26 17:12:19 UTC
Ryan: This issue is assigned to you. Are still working (or still plan to work) on this issue? Only in case you do not plan to work on this issue anymore, should the assignee be set back to default and the bug status changed from ASSIGNED to NEW/UNCONFIRMED? Thanks.

Also removing the "gci2013" whiteboard entry. If this is a good self-contained task for a new contributor, please add "easy" to the "Keywords" field. Thanks.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links