Last modified: 2013-08-26 12:12:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46525, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44525 - Flickr disclaimer message escaped, showing raw HTML
Flickr disclaimer message escaped, showing raw HTML
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
1.21.x
All All
: Normal normal (vote)
: ---
Assigned To: Matthew Flaschen
: patch-need-review
: 44885 (view as bug list)
Depends on:
Blocks: 43747
  Show dependency treegraph
 
Reported: 2013-01-30 23:08 UTC by Ryan Kaldari
Modified: 2013-08-26 12:12 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ryan Kaldari 2013-01-30 23:08:11 UTC
Some time in the last couple weeks something changed that caused the Flickr disclaimer message in UploadWizard to become escaped. In other words, it is now displaying the HTML tags as text rather than parsing them as HTML.

I haven't had a chance to debug yet, but the message is 'mwe-upwiz-flickr-disclaimer' and it is output around line 329 of /resources/mw.UploadWizard.js.

Right now, this is only happening locally, not on Commons yet.
Comment 1 Nischay Nahata 2013-01-31 17:33:25 UTC
I tried mw.message('mwe-upwiz-flickr-disclaimer').plain(); in the console and the output had <br /> escaped. "<a>" isn't escaped though. Rewinding core to few days back gives the right message.

Moving to core-i18n
Comment 2 Nemo 2013-01-31 19:20:51 UTC
Is this a dupe of bug 44459?
Comment 3 Nischay Nahata 2013-01-31 19:25:31 UTC
(In reply to comment #2)
> Is this a dupe of bug 44459?

I think they are different.
Comment 4 Ryan Kaldari 2013-01-31 19:41:03 UTC
Yes, I think they are different as well.
Comment 5 Ryan Kaldari 2013-01-31 19:46:28 UTC
Looks like this bug was trigger by the change to the concat function in https://gerrit.wikimedia.org/r/#/c/44843/

CCing Timo.
Comment 6 Ryan Kaldari 2013-01-31 20:02:10 UTC
Looks like that change is going to get picked up by the deployment train on Monday.
Comment 7 Krinkle 2013-01-31 20:15:01 UTC
Actually, I think those bugs might be related. The one you refer looks like it was implicitly fixed in clean up I did in Ifbeae7e9.

The jqueryMsg parser only supports very basic wikitext (external links, local links and some other basic syntax) and i18n parser functions (plural, gender, grammer). Everything else is not understood and as such considered text nodes and must NOT be used in interface messages parsed in jqueryMsg.

Before Ifbeae7e9, if the text node contained "<" it was parsed as arbitrary HTML. Which broke some special characters and hyperlinks that contained &amp;. Hence I fixed that.

However, It was actually much worse than the above. jqueryMsg didn't check for "<" (that would've been a sloppy implementation with shady intentions), it was just passing it to jQuery's .append() with comment "// strings and numbers". jqueryMsg was intending to appending as text nodes, no html parsing of any kind.

However in jQuery append> domManip> buildFragment> test> rhtml
> rhtml = /<|&#?\w+;/

There, buildFragment decides to parse html or append text node based on some regex...


Messages are not allowed to contain arbitrary html. Everything is plain text unless wikitext supports it (and in this case, the subset of that supported in jqueryMsg). Right that subset is i18n parser functions and wikilinks.

Ifbeae7e9 removed the insecure and unreliable coincidence factor that would sometimes parse any arbitrary html – unfiltered.

If you need certain html tags such as <br> (i.e. the ones allowed in wikitext), they'll have to be properly supported and implemented in jqueryMsg. Otherwise, avoid those kind of tags in those messages.

Suggesting to close as wontfix, invalid or worksforme.
Comment 8 Matthew Flaschen 2013-01-31 20:29:41 UTC
"jqueryMsg was intending to appending as text nodes, no html parsing of any
kind."

On what basis do you say that?  In https://bugzilla.wikimedia.org/show_bug.cgi?id=29542 says, Neil (one of the main devs on it) says:

"(HTML which is fully outside of or fully inside of another element is parsed
correctly.)"

In the actual prior code:

// strings, integers, anything else
$span.append( node );

I take "anything else" at face value.

"Messages are not allowed to contain arbitrary html."

Not true.  Some are.  https://en.wikipedia.org/wiki/MediaWiki:Stub-threshold has a HTML link, which is used as HTML in preferences.

If you try the same thing as wikitext, it will not render.  https://en.wikipedia.org/wiki/User:Superm401/stub_html

Messages are special, and one of the distinguishing factors is that they can use HTML in some cases.

I believe we should allow HTML as HTML, like before, as a temporary solution.

After that, we should consider an HTML mode, since I think that (HTML vs. wikitext, by context), reflects how it is on the server.
Comment 9 Ryan Kaldari 2013-01-31 23:34:23 UTC
It sounds like the html detection was breaking other things so we may need to come up with a more specific solution. The idea of an explicit HTML mode, perhaps even as a separate function, sounds workable. Specifically detecting HTML tags that are supported in Wikitext would also be nice, but it wouldn't solve all the cases (and would add some bloat).

In the meantime, we have to decide what to do with Ifbeae7e9 before Monday. @Krinkle: How widespread was the breakage caused by the HTML detection? Would it be less breakage than will be caused by removing the detection? If we're not going to revert Ifbeae7e9, I would favor at least adding explicit support for <br> tags (and perhaps anchor tags) before Monday.
Comment 10 Ryan Kaldari 2013-02-04 22:09:45 UTC
It looks like change Ifbeae7e9 also breaks some messages in PageTraige.
Comment 11 Matthew Flaschen 2013-02-04 22:51:13 UTC
Ryan, which message(s)?  We should be careful, since it could be bug 44459 (which I know affects PageTriage).  One of my changes introduced bug 44459, but I also have a fix pending (https://gerrit.wikimedia.org/r/#/c/47044/)

They're both jqueryMsg, but otherwise basically unrelated.
Comment 12 Ryan Kaldari 2013-02-04 23:08:20 UTC
It's where HTML entities are being escaped instead of displayed as characters. Pretty sure it's the same as this bug rather than bug 44459.
Comment 13 Matthew Flaschen 2013-02-13 01:36:56 UTC
*** Bug 44885 has been marked as a duplicate of this bug. ***
Comment 14 Matthew Flaschen 2013-02-13 01:49:17 UTC
Note, the elements that are allowed in wikitext are listed here (https://meta.wikimedia.org/wiki/Help:HTML_in_wikitext#Permitted_HTML).  The implementation is in Sanitizer.php.
Comment 15 Ryan Kaldari 2013-02-13 01:51:32 UTC
Bug 44885 fixed in https://gerrit.wikimedia.org/r/#/c/48772/
Comment 16 Matthew Flaschen 2013-02-13 02:03:01 UTC
As it says, that's just a workaround.  This isn't really fixed until jqueryMsg at least supports the elements that are permitted and commonly used, like b, i, etc.

Reopening.
Comment 17 Ryan Kaldari 2013-02-13 02:06:59 UTC
@Matt: Agreed. Do you want to take it, or should I?
Comment 18 Matthew Flaschen 2013-02-13 02:12:18 UTC
If you have cycles, I'll let you take it.

Otherwise, I'll get around to it at some point.
Comment 19 Nischay Nahata 2013-02-21 16:40:41 UTC
https://bugzilla.wikimedia.org/show_bug.cgi?id=29542 related to this?
Comment 20 Ryan Kaldari 2013-02-21 18:05:56 UTC
No, that's a different bug, but obviously affected by change Ifbeae7e9.
Comment 21 Matthew Flaschen 2013-02-21 19:14:49 UTC
However, making supported HTML tags a fully (recursively) parsed construct should fix bug 29542 too.
Comment 22 Matthew Flaschen 2013-03-05 22:47:34 UTC
Ryan Kaldari and I talked.  I'm going to take the first part of this.  He'll help generalize it later.
Comment 23 Matthew Flaschen 2013-03-10 08:30:03 UTC
Okay, this is at https://gerrit.wikimedia.org/r/#/c/53017/ .  I implemented a general framework for any whitelisted elements and attributes.  If a tag isn't allowed (due to bad element, bad attribute, or mismatched), the tag itself will be treated as text.  The contents are still HTML if applicable, which matches the server's behavior.

Only <i> and the common attributes are implemented.  I tested that works (including some nesting tests (links inside) and that invalid (not whitelisted) HTML (<script>, onclick) is treated as text.
Comment 24 Ryan Kaldari 2013-03-14 18:52:24 UTC
I'll try to do code review on this tomorrow.
Comment 25 Nischay Nahata 2013-04-16 09:43:27 UTC
Change merged, is this fixed?
Comment 26 Matthew Flaschen 2013-04-17 04:10:17 UTC
Yes, the root cause is fixed (whitelisted HTML is now allowed).  Specific extensions can now be changed to use that.

Additional tags already allowed on the server-side can be added to mediawiki.jqueryMsg.js as needed.
Comment 27 Gerrit Notification Bot 2013-04-17 09:22:25 UTC
Related URL: https://gerrit.wikimedia.org/r/59602 (Gerrit Change Ie142dc57359121866b603e65d0f7941cf1539d22)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links