Last modified: 2011-02-06 15:35:55 UTC
It would be handy to have info in the RSS feeds on the full size in bytes of each compressed file. Bzip2 archives in particular provide no interface for revealing a files uncompressed size. I'm working on a Firefox extension which could use this information unavailable elsewhere to provide a progress bar when decompressing large files. The current RSS feeds are very minimal and there is plenty of space to include the extra information which should be trivially available to the scripts which create the dump download areas.
Note that uncompressed size is not currently available. It should be possible to make a little wrapper tool to pipe the data through before compression which will count bytes and save it, which could then be pulled to the report & RSS outputs.
Sorry for that silly question, but where can I find this RSS feed? There is no feed <link>ed at download.wikimedia.org.
There is one feed per file per project. Oddly they are in a place whch doesn't seem to have any links to the outside world. I had to ask people on the dev IRC channel to find out about it: http://download.wikipedia.org/enwikipedia/latest/
For bzip2 files at least the uncompressed file size is available without the wrapper tool Brion suggests. Simply providing the -v switch will provide the details to stderr. I don't yet grok the code in backuup/worker.py but it should be easy to parse the verbose reply. Example follows: (stdin): 1.512:1, 5.291 bits/byte, 33.87% saved, 688 in, 455 out.
See also bug 6064