Last modified: 2011-11-29 03:21:01 UTC
It would be great if you could provide listings of the image dumps, such as the two links below:
Justification: I've written an experimental alternative wikipedia browser, which uses offline database
created from wikipedia's XML dump. Since the media file collection is too large, I intended to delegate
requests to media files from my browser to the original wikipedia servers. It is, unfortunately, not
possible to derive the URL of a media file from its specification found in the article's wiki markup.
For example, if the wiki markup contains a reference to image Stuy_building_cropped.jpg, the
corresponding URL is http://upload.wikimedia.org/wikipedia/en/1/17/Stuy_building_cropped.jpg. The only
way how to map short names to their urls is to analyze the listing of media file tarballs. I can create
these listings myself, of course, but I need to download the tarballs, which would waste your and mine
Another rather desperate method is to download the media file's description page upon request, such as
http://en.wikipedia.org/wiki/Image:Stuy_building_cropped.jpg, parse its HTML code, and find the URL of
the underlying image. But I still think that the best method is to maintain a database mapping short
media names to their urls.
This one's basically obsolete; we no longer do tarball dumps of images. :(
Future image dump access is more likely to be over rsync or other systems that pull individual files...
(Note that URLs for images *can* be derived from their name.)