Last modified: 2011-11-29 03:20:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13638 - Wiki data dump intermittently produces corrupt .xml.bz2 files
Wiki data dump intermittently produces corrupt .xml.bz2 files
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Brion Vibber
http://download.wikimedia.org/enwiki/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-07 00:24 UTC by Brion Vibber
Modified: 2011-11-29 03:20 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2008-04-07 00:24:09 UTC
On several occasions we've had corrupt .xml.bz2 files come out of the data dump process.

There are several possible causes:
* dbzip2 might be corrupting data
* NFS filesystem transfers might be corrupting data
* gremlins!

Eliminating dbzip2 as a precaution, to see if this improves matters, would be a good start.

Further checks for corrupt files would also be wise, however. Running a 'bzip2 -t' after generation (or even as a simultaneous side process?) may help to detect bad files and mark them appropriately.

So far, manually re-running the dump produces a correct file; this could be automated if required.
Comment 1 Brion Vibber 2008-04-09 01:35:55 UTC
In r33005 adjusted worker.py to pass dbzip2 mode to dumpTextPass.php only if configured to use dbzip2. Should use regular bzip2 mode for next dumps.
Comment 2 Brion Vibber 2008-12-19 01:09:29 UTC
Just marking this fixed for now...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links