Last modified: 2013-01-26 04:33:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37491, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35491 - Malicious Javascript not stripped from AFTv5
Malicious Javascript not stripped from AFTv5
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
ArticleFeedbackv5 (Other open bugs)
unspecified
All All
: Low minor (vote)
: ---
Assigned To: Fabrice Florin
:
Depends on:
Blocks: 39044
  Show dependency treegraph
 
Reported: 2012-03-26 17:03 UTC by Dario Taraborelli
Modified: 2013-01-26 04:33 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dario Taraborelli 2012-03-26 17:03:33 UTC
User input collected via AFTv5 should be sanitized: all Javascript or potentially malicious code should be stripped before we store a new AFT record in the DB. 

We had three JS injection attempts in aft_article_answer so far. Even if the output in Special:ArticleFeedbackv5 is sanitized [1], we should not allow users to create new records containing malicious code.

Using AbuseFilter could be a long term solution, but if we want to start surfacing the full text of posts collected via AFTv5 we need to have an immediate solution in place. This data is immediately available on the toolserver and if it's not sanitized community-developed tools will be subject to JS injection attempts.I temporarily disabled my own live comment stream on the toolserver until we have a workable solution in place. 

[1] http://en.wikipedia.org/wiki/Special:ArticleFeedbackv5/Research_Works_Act/68401
Comment 1 Roan Kattouw 2012-03-26 17:05:38 UTC
I think this is missing the point. We strip this sort of stuff in wikitext because you can actually use some HTML tags in wikitext, but we don't allow that in comments at all. Trying to remove malicious things is much more work and much more error-prone than just escaping your output.

Recommend WONTFIX.
Comment 2 Dario Taraborelli 2012-03-26 18:48:20 UTC
Agreed it's not a security issue for Wikipedia itself. However if we serve this data to third parties via DB replicas (including trusted DB like the toolserver) we're creating a major opportunity for abuse that we should be aware of (particularly as a large part of the community blindly trusts whoever runs scripts on the toolserver). If we can avoid the need of trusting developers downstream by removing obviously dangerous contents upstream I don't see why we shouldn't do it. Simply documenting that there is no user input sanitization in AFT (while we do have abuse filters in place for revisions) is not sufficient to avoid this risk in my opinion.
Comment 3 Krinkle 2012-03-27 20:13:03 UTC
I agree with Roan here. User comments could contain a wide range of content. They may also be reporting script related bugs and as such I don't think anything should be stripped from a comment. The output is where escaping happens and that can be as easy as a single call to htmlspecialchars() in PHP before outputting it, that covers 99% if not 100% of problems with user input, and it doesn't destroy anything the user put in.

If a toolserver user is outputting data from the database without escaping it, they should be reported to a ts-admin as soon as possible to have their account blocked until further notice (okay maybe not like that, but it would be a serious consideration whether they know what they're doing).
Comment 4 Dario Taraborelli 2012-03-27 21:14:40 UTC
Then you should probably have my account blocked as I was temporarily providing tools to browse/access AFT CSV dumps ;)

We should have a broader discussion on this. Much as we have abuse filters in place we should consider automatic filters for all feedback functionality which is likely to attract a high amount of abusive/malicious stuff and we simply cannot control what end users are doing with our data. Fabrice and his team are working on requirements to implement this in AFTv5.

An additional (although unrelated) reason why we should filter junk at the input not the output is for analytics purposes. Scripted attempts to add malicious code (as well as any scripted spam attempts) will make the feedback data useless for analysis if no user input throttling or filtering is in place. We can make a decision to filter this data on the consumer end (in which case this should fall under the competence of the analytics team), but blocking obvious junk at the input (while separately logging blocks) would certainly help alleviate this problem.
Comment 5 Fabrice Florin 2012-04-10 00:40:35 UTC
I lowered the priority of this request and moved it to the next release, because: 
1) we couldn't find an existing filter we could adapt for this; 
2) we couldn't find community volunteers to write one for us; 
3) we only plan to write a few filters ourselves, since this is driven by the community; 
4) the severity of this issue is not as high as we had initially thought, after consulting with Roan and others. 

But thanks so much for bringing it up! We'll keep tracking it for the next release.
Comment 6 Reha Sterbin 2012-04-18 15:43:36 UTC
Assigning to Fabrice until we're ready to move on this.
Comment 7 Krinkle 2013-01-26 03:52:14 UTC
(In reply to comment #4)
> ..providing tools to browse/access AFT CSV dumps ;)
> 

And you didn't escape the output to the browser? That doesn't only impose security issues (malicious javascript may be the least of your worries), it is also incorrect and can cause content to be incorrectly displayed.

No worries, simple mistake to make. But do understand it!

Per "Good security through lazyness", escaping is not related to security, meaning you must not consider it to be a security measure. It is a content measure. If a user comments that they:
> would like something to be <than 500 but>100

It should be outputted like that, just like bugzilla did. And it will when you properly escape it. If you don't escape it, you'll get this:
> like foo to 100

Because:
> <p>like foo to <than 500="" but="">100</than></p>

Omitting bogus entries is good and useful for analytical purposes. Even better to keep track of how many were bogus and how many real.

But stripping characters from the comments because they are dangerous as HTML, is a whole other thing. It will corrupt the data. The problem is that you must not consider the contents HTML, they're text. Output and use them as such and there is no problem whatsoever.

Unless repurposing this bug to find a way to omit bogus entires entirely (in a way more advanced than stripping something that might look like javascript), please wontfix.
Comment 8 Dario Taraborelli 2013-01-26 04:33:58 UTC
Not escaping the output was an obvious mistake, this request is now obsolete as AbuseFilter has been introduced months ago to take care of random junk posted to AFT by bots. Closing it as wontfix.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links