Last modified: 2014-09-23 23:43:14 UTC
The Import and Export functions should support zipped XML as an option. This would solve several issues when importing/exporting a large number of pages/versions: For Export: * Browsers may try to display the XML in a fancy tree. For a several-MB-file, this may bog down the computer or crash the browser. It's pointles, anyway * Browsers often mangle the XML when saving it. In FireFox, saving the page from the source view leads to broken results, and saving from the normal XML view will only work if you manually select the (non-obvious) "HTML only" option. * Downloading a ziped file will be faster, even if zlib compression is enabled on the server, because the browser will not uncompress it. * A ziped file will trigger a download dialog, which makes more sense for an export than showing XML in the browser. For Import: * If export supports zip, import should too * The Ziped file will be a lot smaller. People may have upload limits in php, in apache, or for their web account. * Zipped files can be detected and handeled automatically An additional option for importing a file that is on the server's file system may be handy too, especially for people who don't have shell access to the server, but can upload stuff via FTP (as is quite often the case). But there may be security issues with this.
Created attachment 1082 [details] Patch for Special:Import to support gzip and bzip2 compression I'm new in MediaWiki code. This patch used some lines of phpMyAdmin source code (adapted for MediaWiki): file format detection. I wrote a PHP stream (file StringStream.php) to reuse MediaWiki code instead of writing my own ImportStreamSource class. Be carefull: stream_wrapper_register function needs "PHP 4 >= 4.3.2, PHP 5". TO DO: - Maybe show different error messages if gzip/bzip2 decompression isn't supported - Write new ImportStringStreamSource (based on ImportStreamSource) to be compatible with PHP < 4.3.2 (?) - Test it :-) I wrote this class to upload large XML file (+ 8 MB), but it didn't solved my problem. I have to split XML into several parts... Haypo
It looks like it's decompressing the entire file to memory first; this seems really inefficient, and you can hit your memory_limit or run the server into swap on a large file. I'd recommend instead using PHP's stream wrappers (compress.zlib:// and compress.bzip2:// for .gz and .bz2 respectively), as importDump.php already does on the command line.
Created attachment 1408 [details] gzipped import/export
Hi all. See attach - this my way for gzipped import/export. :)
Created attachment 1409 [details] Patch for gzipped import/export I' so sorry... :( Previous patch I'm attached in gzip format. Now, patch attached in plain/text :)
Created attachment 1410 [details] Patch for gzipped import/export I' so sorry... :( Previous patch I'm attached in gzip format. Now, patch attached in plain/text :)
Patch doesn't look right; the import infrastructure already has support for reading gzipped data using fopen wrappers. Better to reuse that rather than changing all the functions around. Export data is also gzipped normally if the user-agent requests it, as is all other output. Why double-zip it?
Zipping the output explicitely avoids several problem: if only transfer-encoded as zip, the browser will probably a) unpack it unnecessarily b) try to show it as an xml tree and c) may even save it as something html-like. Sending it out as an archive will cause the browser to simply pop up a download dialog, which is what the user expects, and also much faster and safer (integrity-wise). Double-zipping should be avoided, though. I don't know on which level mediawiki handles transfer-encoding, but ziping should be avoided for archives, images, etc... basically, everything that's not text/*. This should be doable - maybe PHP is even smart enough to handle this automatically?
> the import infrastructure already has support for reading gzipped data gzopen() are transparently open gzipped and not-gzipped files. All ok. :) But in import I'm don't check for present functions 'gzopen', 'gzread'... Double-gzipped.. Gm... I have seen this problem, therefore have put a degree of compression 0 in export-function. Now I solve it.
Mass compoment change: <some> -> Export/Import
(In reply to comment #8) > Zipping the output explicitely avoids several problem: if only transfer-encoded > as zip, the browser will probably a) unpack it unnecessarily b) try to show it > as an xml tree and c) may even save it as something html-like. Sending it out as > an archive will cause the browser to simply pop up a download dialog, which is > what the user expects, and also much faster and safer (integrity-wise). You can instead use the Content-Disposition header, as we currently do. $wgRequest->response()->header( "Content-disposition: attachment;filename={$filename}" );
Comment on attachment 1409 [details] Patch for gzipped import/export Marking dupe patch obsolete
I guess it's fine to allow the user to select "compress output file" or something on the special:export form. I would make it the user's choice though, rather than replacing the existing plain text format as this patch does.
Comment on attachment 1082 [details] Patch for Special:Import to support gzip and bzip2 compression marking patch obsolete per Ariel's review
Comment on attachment 1410 [details] Patch for gzipped import/export Marking patch obsolete per Ariel's review