Last modified: 2009-12-30 05:18:35 UTC
I propose to add a CSV mode to special pages that display mainly a list. This
would greately help bots and scripts with parsing, and would ease the server
load created by such scripts.
This patch implements CSV support in the QueryPage class, and adds CSV support
to (currently) 23 special pages without any extra effort.
patch will follow in a minute
Created attachment 974 [details]
patch for QueryPage.php
Removing patch keyword; patch fails security
Exploit proof of concept:
1) Set up MediaWiki on a server allowing PATHINFO
expansion or with a rewrite rule or alias for wiki
pages, plus this patch.
2) Upload an image, with the description field
filled out with this fragment:
3) In MSIE 6 on Windows, visit
The extended path fragment ending in ".html" is
interpreted by MSIE as an override of the unknown
Content-Type; the HTML fragment in output is then
the site wide open to cross-site scripting attack.
At a minimum, this needs to perform the kind of
security checks that action=raw does, checking
that a canonical URL is being used.
There may be additional information disclosure
issues if there is data in the internal rows being
passed around that shouldn't make it outside, but
I haven't checked for this yet.
Created attachment 977 [details]
improved patch, see comment
I improved the patch in a few ways:
* All fields are now URL-encoded. This has two advantages:
- it makes the csv output imune to scripting attacks.
- it allows lines to be split at the separator char reliably.
* querycache is now used if applicable (untested)
* Mime type is now text/plain instead of text/csv. While text/csv is the
proposed standard, it is not widely supported yet. text/plain is shown directly
by webbrowsers, as opposed to asking for download or an expernal app.
* Subclasses can new specify which columns to include in CSV output by
overwriting the csvFields() function. This may be used to adress any unwanted
exposure of internal data.
I'm not sure I like 'gen=csv'. It might make sense to be consistent
with existing things like 'action=raw'. On the other hand 'feed=rss'
etc... Bleh. :)
doCSV() should *not* exit; the script should continue running to
completion (so any post functions and profiling can be run at the end
among other things). If you're trying to disable the HTML output, use
There looks to be some duplication of code in doCSV(); it's running
queries and such all over again. This should instead share existing
code; extract common submethods where appropriate.
Created attachment 981 [details]
updated patch. see comment
I updated the patch to address the issues mentioned above. Specifically:
* doCSV() no longer calls exit, but uses $wgOut->disable()
* the actual database query has been factored out, receache(), doQuery(),
doCSV(), and doFeed() now all use the same function. I hope I did not miss any
This still uses gen=csv to trigger CSV mode. Using action=csv would not work
without hacking around index.php, which is already quite ugly in that respect.
Also, action indicates *what* is shown, there should be a convention for a
separate parameter that determines *how* the data is shown. Consider that in
the future, CSV (and XML, and...) support could be added for instance to the
history view, which is triggered by action=history - the format needs to be in
a separate parameter. I'm using "gen" because it is already used for js and css
(right?). The alternative would be to (mis-)use the "feed" parameter, or a
common "output" or "format" parameter. This would have many implications,
NOTE: I have not tested this with the querycache - i'm not sure how to do that.
But it should work as before, since I have not changed anything in that code,
at least not intentionally.
RSS Feed is also untested, because there is currently no special page that is
based on QueryPage and has syndication enabled.
I have put together some general ideas and suggestions for a REST interface om
meta. See here:
The patch suggest in this bug is one of the corner stones of a REST interface as
I propose it. Please have a look...
Created attachment 1003 [details]
factored out CSV creation. patch for the additional file to follow.
Note to anyone: CVS sucks. It can't include new files in diffs.
Created attachment 1004 [details]
standalone CSV class, required by previous patch. Diffed against a dummy.
Duping this to bug 14869, which is basically trying to accomplish the same thing, but in a much cleaner way.
*** This bug has been marked as a duplicate of bug 14869 ***