Last modified: 2011-09-18 06:48:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 27126 - The bzip2 python stuff is really ugly. Maybe parts should be redone in C.
The bzip2 python stuff is really ugly. Maybe parts should be redone in C.
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
All All
: Normal enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
Depends on:
Blocks: 27110
  Show dependency treegraph
Reported: 2011-02-03 04:25 UTC by Ariel T. Glenn
Modified: 2011-09-18 06:48 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Ariel T. Glenn 2011-02-03 04:25:43 UTC
There are a couple of files of low level awful bzip2 stuff in python we need for beeing to seek around and find block boundaries in ginormous files.  The python code is ewwww gross. We would likely be better off hacking up the bzip2 library and using that instead of the current python interface to the standard bzip2 library.  Question is how worth it is it to invest more time into that.  Low priority for now.
Comment 1 Brion Vibber 2011-02-04 17:55:48 UTC
Is this about dbzip2 or something else? There are several other projects that do pretty much what dbzip2 did, and there should be some that are better maintained and better-performing these days...
Comment 2 Ariel T. Glenn 2011-02-04 20:11:36 UTC
Ah no, specifically I mean my python bzip2 stuff that I will shortly be using to do things like find the last pageid in a truncated bzip2 history file by seeking to near the end of the file and grabbing it.  It works, ok... but it is gross. 

However!  Please clue me in if there are parallel bzip2 projects further along than dbzip2; we might be looking at them for other reasons.
Comment 3 Diederik van Liere 2011-03-22 18:30:25 UTC
Parallel bzip2 ( might be interesting to speed up the compression process.
Comment 4 Ariel T. Glenn 2011-03-22 19:42:08 UTC
well there is a whole project lying around for this, please see:

I've been looking for suckers^Wvolunteers to poke at it...
Comment 5 Ariel T. Glenn 2011-09-18 06:48:41 UTC
done, if not perfect, and a huge improvement over the python stuff I had.

Note You need to log in before you can comment on or make changes to this bug.