Last modified: 2014-09-23 23:43:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 3893 - Import/Export should support zipped XML
Import/Export should support zipped XML
Status: NEW
Product: MediaWiki
Classification: Unclassified
Export/Import (Other open bugs)
1.6.x
All All
: Low enhancement with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-06 15:11 UTC by Daniel Kinzler
Modified: 2014-09-23 23:43 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch for Special:Import to support gzip and bzip2 compression (3.52 KB, patch)
2005-11-18 02:47 UTC, Victor Stinner
Details
gzipped import/export (7.63 KB, patch)
2006-02-21 14:13 UTC, Slava Zanko
Details
Patch for gzipped import/export (22.75 KB, patch)
2006-02-21 14:19 UTC, Slava Zanko
Details
Patch for gzipped import/export (22.75 KB, patch)
2006-02-21 14:19 UTC, Slava Zanko
Details

Description Daniel Kinzler 2005-11-06 15:11:29 UTC
The Import and Export functions should support zipped XML as an option. This
would solve several issues when importing/exporting a large number of
pages/versions:

For Export:
* Browsers may try to display the XML in a fancy tree. For a several-MB-file,
this may bog down the computer or crash the browser. It's pointles, anyway
* Browsers often mangle the XML when saving it. In FireFox, saving the page from
the source view leads to broken results, and saving from the normal XML view
will only work if you manually select the (non-obvious) "HTML only" option.
* Downloading a ziped file will be faster, even if zlib compression is enabled
on the server, because the browser will not uncompress it.
* A ziped file will trigger a download dialog, which makes more sense for an
export than showing XML in the browser.

For Import:
* If export supports zip, import should too
* The Ziped file will be a lot smaller. People may have upload limits in php, in
apache, or for their web account.
* Zipped files can be detected and handeled automatically

An additional option for importing a file that is on the server's file system
may be handy too, especially for people who don't have shell access to the
server, but can upload stuff via FTP (as is quite often the case). But there may
be security issues with this.
Comment 1 Victor Stinner 2005-11-18 02:47:36 UTC
Created attachment 1082 [details]
Patch for Special:Import to support gzip and bzip2 compression

I'm new in MediaWiki code. This patch used some lines of phpMyAdmin source code
(adapted for MediaWiki): file format detection. I wrote a PHP stream (file
StringStream.php) to reuse MediaWiki code instead of writing my own
ImportStreamSource class.

Be carefull: stream_wrapper_register function needs "PHP 4 >= 4.3.2, PHP 5".

TO DO:
- Maybe show different error messages if gzip/bzip2 decompression isn't
supported
- Write new ImportStringStreamSource (based on ImportStreamSource) to be
compatible with PHP < 4.3.2 (?)
- Test it :-)

I wrote this class to upload large XML file (+ 8 MB), but it didn't solved my
problem. I have to split XML into several parts...

Haypo
Comment 2 Brion Vibber 2005-11-18 04:57:03 UTC
It looks like it's decompressing the entire file to memory first; this seems 
really inefficient, and you can hit your memory_limit or run the server into 
swap on a large file.

I'd recommend instead using PHP's stream wrappers (compress.zlib:// and 
compress.bzip2:// for .gz and .bz2 respectively), as importDump.php already 
does on the command line.
Comment 3 Slava Zanko 2006-02-21 14:13:02 UTC
Created attachment 1408 [details]
gzipped import/export
Comment 4 Slava Zanko 2006-02-21 14:13:33 UTC
Hi all. See attach - this my way for gzipped import/export. :)
Comment 5 Slava Zanko 2006-02-21 14:19:42 UTC
Created attachment 1409 [details]
Patch for gzipped import/export

I' so sorry... :( Previous patch I'm attached in gzip format.
Now, patch attached in plain/text :)
Comment 6 Slava Zanko 2006-02-21 14:19:53 UTC
Created attachment 1410 [details]
Patch for gzipped import/export

I' so sorry... :( Previous patch I'm attached in gzip format.
Now, patch attached in plain/text :)
Comment 7 Brion Vibber 2006-02-21 18:38:31 UTC
Patch doesn't look right; the import infrastructure already has
support for reading gzipped data using fopen wrappers. Better to
reuse that rather than changing all the functions around.

Export data is also gzipped normally if the user-agent requests it,
as is all other output. Why double-zip it?
Comment 8 Daniel Kinzler 2006-02-21 19:46:54 UTC
Zipping the output explicitely avoids several problem: if only transfer-encoded
as zip, the browser will probably a) unpack it unnecessarily b) try to show it
as an xml tree and c) may even save it as something html-like. Sending it out as
an archive will cause the browser to simply pop up a download dialog, which is
what the user expects, and also much faster and safer (integrity-wise).

Double-zipping should be avoided, though. I don't know on which level mediawiki
handles transfer-encoding, but ziping should be avoided for archives, images,
etc... basically, everything that's not text/*. This should be doable - maybe
PHP is even smart enough to handle this automatically?
Comment 9 Slava Zanko 2006-02-22 09:17:20 UTC
> the import infrastructure already has support for reading gzipped data
gzopen() are transparently open gzipped and not-gzipped files. All ok. :)
But in import I'm don't check for present functions 'gzopen', 'gzread'...
Double-gzipped.. Gm... I have seen this problem, therefore have put a degree of
compression 0 in export-function.
Now I solve it.
Comment 10 Siebrand Mazeland 2008-08-18 18:47:30 UTC
Mass compoment change: <some> -> Export/Import
Comment 11 Andrew Garrett 2009-03-04 05:56:29 UTC
(In reply to comment #8)
> Zipping the output explicitely avoids several problem: if only transfer-encoded
> as zip, the browser will probably a) unpack it unnecessarily b) try to show it
> as an xml tree and c) may even save it as something html-like. Sending it out as
> an archive will cause the browser to simply pop up a download dialog, which is
> what the user expects, and also much faster and safer (integrity-wise).

You can instead use the Content-Disposition header, as we currently do.

 $wgRequest->response()->header( "Content-disposition: attachment;filename={$filename}" );
Comment 12 Sam Reed (reedy) 2011-11-11 00:10:43 UTC
Comment on attachment 1409 [details]
Patch for gzipped import/export

Marking dupe patch obsolete
Comment 13 Ariel T. Glenn 2011-11-24 15:41:59 UTC
I guess it's fine to allow the user to select "compress output file" or something on the special:export form.  I would make it the user's choice though, rather than replacing the existing plain text format as this patch does.
Comment 14 Sumana Harihareswara 2012-10-12 01:22:13 UTC
Comment on attachment 1082 [details]
Patch for Special:Import to support gzip and bzip2 compression

marking patch obsolete per Ariel's review
Comment 15 Sumana Harihareswara 2012-10-12 01:22:32 UTC
Comment on attachment 1410 [details]
Patch for gzipped import/export

Marking patch obsolete per Ariel's review

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links