Last modified: 2012-12-03 17:54:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17623, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15623 - Wikistats: Consider compressing with bzip2 (or even 7zip)
Wikistats: Consider compressing with bzip2 (or even 7zip)
Status: RESOLVED WONTFIX
Product: Datasets
Classification: Unclassified
Webstatscollector (Other open bugs)
unspecified
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://dammit.lt/wikistats/
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-16 22:38 UTC by Melancholie
Modified: 2012-12-03 17:54 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Melancholie 2008-09-16 22:38:43 UTC
Consider compressing logs (dumps at http://dammit.lt/wikistats/) with bzip2 (or even 7zip) instead of gzip.

There would be a reduction of up to 25 % (with 7z: another 5% compared to bzip2), disk space and traffic!
Comment 1 Melancholie 2008-09-16 22:51:37 UTC
Please pre-announce, so that Henrik and Erik can be informed.

Command:
tar -cjf
Comment 2 Domas Mituzas 2008-10-21 23:02:04 UTC
I don't like the idea!
Comment 3 Melancholie 2009-04-03 06:03:51 UTC
A nice comparison, by the way:
http://warp.povusers.org/ArchiverComparison/
Comment 4 Andre Klapper 2012-12-03 14:00:09 UTC
[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]
Comment 5 Nemo 2012-12-03 17:54:39 UTC
(In reply to comment #3)
> A nice comparison, by the way:

Better comparison (on actual data) copied from https://wiki.toolserver.org/view/Talk:User-store :

A huge portion of the space is taken by visitors stats, although now they have two mirrors (WMF and IA). The oldest ones are compressed in LZMA (xz). Compressing gz or xz is useless, can only increase size. I made some tests of compression of a whole month uncompressed, 2011-03-pagecounts (184G):

    7z a -t7z -m0=BZip2 -mmt=6 -mx9 takes ~27h (6 cores, less than 100M memory) and gives 41G
    7z a -t7z -m0=LZMA -mmt=on -mx9 -md=64m -mfb=64 takes ~56h (2 cores, 800M memory) and gives 37G
    7z a -t7z -m0=LZMA -mmt=on -mx9 -md=256m -mfb=64 -ms=on takes about 3 days (2 cores, 2700M memory) and gives 35G
    tar with xz uses LZMA with standard settings and can only give worse results (I tried it but it got killed by mistake, wasn't going anywhere though)
    individual gz are 51.2G
    individual xz of this month are not yet available for comparison 

--Nemo 10:23, 22 March 2012 (UTC)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links