Last modified: 2011-11-29 03:20:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13637 - Wiki data dump bzip2 -> 7zip conversion doesn't report failure on corrupt input
Wiki data dump bzip2 -> 7zip conversion doesn't report failure on corrupt input
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
http://download.wikimedia.org/enwiki/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-07 00:19 UTC by Brion Vibber
Modified: 2011-11-29 03:20 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2008-04-07 00:19:59 UTC
When the history .xml.7z file is generated, sometimes the bzip2 decompression fails. This may be due to a corrupt file in the first place...

But the failure here is hidden. Bzip2 spews an error and exits, but 7zip happily considers it the end of the file and wraps up "successfully".

The error condition in the input should be detected; at a minimum, this allows the corrupted output file to be marked as failed.
Comment 1 Brion Vibber 2008-12-19 01:01:54 UTC
More dump generation bits...

Currently the .7z files are generated by decompressing the .bz2 and piping into p7zip... this is kinda slow and also won't report errors properly at present.
Comment 2 Mark A. Hershberger 2011-05-03 18:57:05 UTC
Givng dump bugs to Ariel.
Comment 3 Ariel T. Glenn 2011-08-29 16:51:41 UTC
We could refuse to generate the 7z file if the bz2 input file is truncated.  Would that be sufficient? (We have a fast way to detect that now.)
Comment 4 Ariel T. Glenn 2011-09-18 07:34:07 UTC
If the bz2 file is truncated it is now moved out of the way at the end of the step that produces it.  This means it's not going to be available as input for the 7z file, so that step will fail and be marked as such. Closing.
Comment 5 Ariel T. Glenn 2011-09-18 07:35:27 UTC
I should say that code isn't deployed for anything but en wiki dumps yet.  I'd better make it live on the other servers.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links