Last modified: 2011-11-29 03:21:01 UTC
It would be great if you could provide listings of the image dumps, such as the two links below: http://download.wikimedia.org/images/wikipedia/en/upload.tar http://download.wikimedia.org/images/special/commons/20051126_upload.tar Justification: I've written an experimental alternative wikipedia browser, which uses offline database created from wikipedia's XML dump. Since the media file collection is too large, I intended to delegate requests to media files from my browser to the original wikipedia servers. It is, unfortunately, not possible to derive the URL of a media file from its specification found in the article's wiki markup. For example, if the wiki markup contains a reference to image Stuy_building_cropped.jpg, the corresponding URL is http://upload.wikimedia.org/wikipedia/en/1/17/Stuy_building_cropped.jpg. The only way how to map short names to their urls is to analyze the listing of media file tarballs. I can create these listings myself, of course, but I need to download the tarballs, which would waste your and mine bandwidth. Another rather desperate method is to download the media file's description page upon request, such as http://en.wikipedia.org/wiki/Image:Stuy_building_cropped.jpg, parse its HTML code, and find the URL of the underlying image. But I still think that the best method is to maintain a database mapping short media names to their urls.
This one's basically obsolete; we no longer do tarball dumps of images. :( Future image dump access is more likely to be over rsync or other systems that pull individual files... (Note that URLs for images *can* be derived from their name.)