Last modified: 2011-11-29 03:20:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15637, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13637 - Wiki data dump bzip2 -> 7zip conversion doesn't report failure on corrupt input
Wiki data dump bzip2 -> 7zip conversion doesn't report failure on corrupt input
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
http://download.wikimedia.org/enwiki/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-07 00:19 UTC by Brion Vibber
Modified: 2011-11-29 03:20 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2008-04-07 00:19:59 UTC
When the history .xml.7z file is generated, sometimes the bzip2 decompression fails. This may be due to a corrupt file in the first place...

But the failure here is hidden. Bzip2 spews an error and exits, but 7zip happily considers it the end of the file and wraps up "successfully".

The error condition in the input should be detected; at a minimum, this allows the corrupted output file to be marked as failed.
Comment 1 Brion Vibber 2008-12-19 01:01:54 UTC
More dump generation bits...

Currently the .7z files are generated by decompressing the .bz2 and piping into p7zip... this is kinda slow and also won't report errors properly at present.
Comment 2 Mark A. Hershberger 2011-05-03 18:57:05 UTC
Givng dump bugs to Ariel.
Comment 3 Ariel T. Glenn 2011-08-29 16:51:41 UTC
We could refuse to generate the 7z file if the bz2 input file is truncated.  Would that be sufficient? (We have a fast way to detect that now.)
Comment 4 Ariel T. Glenn 2011-09-18 07:34:07 UTC
If the bz2 file is truncated it is now moved out of the way at the end of the step that produces it.  This means it's not going to be available as input for the 7z file, so that step will fail and be marked as such. Closing.
Comment 5 Ariel T. Glenn 2011-09-18 07:35:27 UTC
I should say that code isn't deployed for anything but en wiki dumps yet.  I'd better make it live on the other servers.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links