Last modified: 2009-12-30 05:18:35 UTC
I propose to add a CSV mode to special pages that display mainly a list. This would greately help bots and scripts with parsing, and would ease the server load created by such scripts. This patch implements CSV support in the QueryPage class, and adds CSV support to (currently) 23 special pages without any extra effort. patch will follow in a minute
Created attachment 974 [details] patch for QueryPage.php
Removing patch keyword; patch fails security review. Exploit proof of concept: 1) Set up MediaWiki on a server allowing PATHINFO expansion or with a rewrite rule or alias for wiki pages, plus this patch. 2) Upload an image, with the description field filled out with this fragment: <script>alert(document.cookie)</script> 3) In MSIE 6 on Windows, visit Special:Unusedimages/evil.html?csv=yes The extended path fragment ending in ".html" is interpreted by MSIE as an override of the unknown Content-Type; the HTML fragment in output is then interpreted and the JavaScript executed, leaving the site wide open to cross-site scripting attack. At a minimum, this needs to perform the kind of security checks that action=raw does, checking that a canonical URL is being used. There may be additional information disclosure issues if there is data in the internal rows being passed around that shouldn't make it outside, but I haven't checked for this yet.
Created attachment 977 [details] improved patch, see comment I improved the patch in a few ways: * All fields are now URL-encoded. This has two advantages: - it makes the csv output imune to scripting attacks. - it allows lines to be split at the separator char reliably. * querycache is now used if applicable (untested) * Mime type is now text/plain instead of text/csv. While text/csv is the proposed standard, it is not widely supported yet. text/plain is shown directly by webbrowsers, as opposed to asking for download or an expernal app. * Subclasses can new specify which columns to include in CSV output by overwriting the csvFields() function. This may be used to adress any unwanted exposure of internal data.
I'm not sure I like 'gen=csv'. It might make sense to be consistent with existing things like 'action=raw'. On the other hand 'feed=rss' etc... Bleh. :) doCSV() should *not* exit; the script should continue running to completion (so any post functions and profiling can be run at the end among other things). If you're trying to disable the HTML output, use $wgOut->disable(). There looks to be some duplication of code in doCSV(); it's running queries and such all over again. This should instead share existing code; extract common submethods where appropriate.
Created attachment 981 [details] updated patch. see comment I updated the patch to address the issues mentioned above. Specifically: * doCSV() no longer calls exit, but uses $wgOut->disable() * the actual database query has been factored out, receache(), doQuery(), doCSV(), and doFeed() now all use the same function. I hope I did not miss any suptle differences. This still uses gen=csv to trigger CSV mode. Using action=csv would not work without hacking around index.php, which is already quite ugly in that respect. Also, action indicates *what* is shown, there should be a convention for a separate parameter that determines *how* the data is shown. Consider that in the future, CSV (and XML, and...) support could be added for instance to the history view, which is triggered by action=history - the format needs to be in a separate parameter. I'm using "gen" because it is already used for js and css (right?). The alternative would be to (mis-)use the "feed" parameter, or a common "output" or "format" parameter. This would have many implications, though... NOTE: I have not tested this with the querycache - i'm not sure how to do that. But it should work as before, since I have not changed anything in that code, at least not intentionally. RSS Feed is also untested, because there is currently no special page that is based on QueryPage and has syndication enabled.
I have put together some general ideas and suggestions for a REST interface om meta. See here: http://meta.wikimedia.org/wiki/REST The patch suggest in this bug is one of the corner stones of a REST interface as I propose it. Please have a look...
Created attachment 1003 [details] factored out CSV creation. patch for the additional file to follow. Note to anyone: CVS sucks. It can't include new files in diffs.
Created attachment 1004 [details] standalone CSV class, required by previous patch. Diffed against a dummy.
Duping this to bug 14869, which is basically trying to accomplish the same thing, but in a much cleaner way. *** This bug has been marked as a duplicate of bug 14869 ***