Last modified: 2011-11-29 03:20:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15638, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13638 - Wiki data dump intermittently produces corrupt .xml.bz2 files
Wiki data dump intermittently produces corrupt .xml.bz2 files
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Brion Vibber
http://download.wikimedia.org/enwiki/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-07 00:24 UTC by Brion Vibber
Modified: 2011-11-29 03:20 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2008-04-07 00:24:09 UTC
On several occasions we've had corrupt .xml.bz2 files come out of the data dump process.

There are several possible causes:
* dbzip2 might be corrupting data
* NFS filesystem transfers might be corrupting data
* gremlins!

Eliminating dbzip2 as a precaution, to see if this improves matters, would be a good start.

Further checks for corrupt files would also be wise, however. Running a 'bzip2 -t' after generation (or even as a simultaneous side process?) may help to detect bad files and mark them appropriately.

So far, manually re-running the dump produces a correct file; this could be automated if required.
Comment 1 Brion Vibber 2008-04-09 01:35:55 UTC
In r33005 adjusted worker.py to pass dbzip2 mode to dumpTextPass.php only if configured to use dbzip2. Should use regular bzip2 mode for next dumps.
Comment 2 Brion Vibber 2008-12-19 01:09:29 UTC
Just marking this fixed for now...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links