Last modified: 2011-11-29 03:21:01 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 7789 - RFE: provide listings for image dump tarballs
RFE: provide listings for image dump tarballs
Status: RESOLVED INVALID
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All Windows XP
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-02 11:59 UTC by Michal Sevcenko
Modified: 2011-11-29 03:21 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michal Sevcenko 2006-11-02 11:59:55 UTC
It would be great if you could provide listings of the image dumps, such as the two links below:

http://download.wikimedia.org/images/wikipedia/en/upload.tar
http://download.wikimedia.org/images/special/commons/20051126_upload.tar

Justification: I've written an experimental alternative wikipedia browser, which uses offline database 
created from wikipedia's XML dump. Since the media file collection is too large, I intended to delegate 
requests to media files from my browser to the original wikipedia servers. It is, unfortunately, not 
possible to derive the URL of a media file from its specification found in the article's wiki markup. 
For example, if the wiki markup contains a reference to image Stuy_building_cropped.jpg, the 
corresponding URL is http://upload.wikimedia.org/wikipedia/en/1/17/Stuy_building_cropped.jpg. The only 
way how to map short names to their urls is to analyze the listing of media file tarballs. I can create 
these listings myself, of course, but I need to download the tarballs, which would waste your and mine 
bandwidth.

Another rather desperate method is to download the media file's description page upon request, such as 
http://en.wikipedia.org/wiki/Image:Stuy_building_cropped.jpg, parse its HTML code, and find the URL of 
the underlying image. But I still think that the best method is to maintain a database mapping short 
media names to their urls.
Comment 1 Brion Vibber 2009-03-30 21:49:56 UTC
This one's basically obsolete; we no longer do tarball dumps of images. :(

Future image dump access is more likely to be over rsync or other systems that pull individual files...

(Note that URLs for images *can* be derived from their name.)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links