Last modified: 2012-12-03 17:54:39 UTC
Consider compressing logs (dumps at http://dammit.lt/wikistats/) with bzip2 (or even 7zip) instead of gzip. There would be a reduction of up to 25 % (with 7z: another 5% compared to bzip2), disk space and traffic!
Please pre-announce, so that Henrik and Erik can be informed. Command: tar -cjf
I don't like the idea!
A nice comparison, by the way: http://warp.povusers.org/ArchiverComparison/
[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]
(In reply to comment #3) > A nice comparison, by the way: Better comparison (on actual data) copied from https://wiki.toolserver.org/view/Talk:User-store : A huge portion of the space is taken by visitors stats, although now they have two mirrors (WMF and IA). The oldest ones are compressed in LZMA (xz). Compressing gz or xz is useless, can only increase size. I made some tests of compression of a whole month uncompressed, 2011-03-pagecounts (184G): 7z a -t7z -m0=BZip2 -mmt=6 -mx9 takes ~27h (6 cores, less than 100M memory) and gives 41G 7z a -t7z -m0=LZMA -mmt=on -mx9 -md=64m -mfb=64 takes ~56h (2 cores, 800M memory) and gives 37G 7z a -t7z -m0=LZMA -mmt=on -mx9 -md=256m -mfb=64 -ms=on takes about 3 days (2 cores, 2700M memory) and gives 35G tar with xz uses LZMA with standard settings and can only give worse results (I tried it but it got killed by mistake, wasn't going anywhere though) individual gz are 51.2G individual xz of this month are not yet available for comparison --Nemo 10:23, 22 March 2012 (UTC)