Last modified: 2011-11-29 03:21:01 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T9789, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 7789 - RFE: provide listings for image dump tarballs
RFE: provide listings for image dump tarballs
Status: RESOLVED INVALID
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All Windows XP
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-02 11:59 UTC by Michal Sevcenko
Modified: 2011-11-29 03:21 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michal Sevcenko 2006-11-02 11:59:55 UTC
It would be great if you could provide listings of the image dumps, such as the two links below:

http://download.wikimedia.org/images/wikipedia/en/upload.tar
http://download.wikimedia.org/images/special/commons/20051126_upload.tar

Justification: I've written an experimental alternative wikipedia browser, which uses offline database 
created from wikipedia's XML dump. Since the media file collection is too large, I intended to delegate 
requests to media files from my browser to the original wikipedia servers. It is, unfortunately, not 
possible to derive the URL of a media file from its specification found in the article's wiki markup. 
For example, if the wiki markup contains a reference to image Stuy_building_cropped.jpg, the 
corresponding URL is http://upload.wikimedia.org/wikipedia/en/1/17/Stuy_building_cropped.jpg. The only 
way how to map short names to their urls is to analyze the listing of media file tarballs. I can create 
these listings myself, of course, but I need to download the tarballs, which would waste your and mine 
bandwidth.

Another rather desperate method is to download the media file's description page upon request, such as 
http://en.wikipedia.org/wiki/Image:Stuy_building_cropped.jpg, parse its HTML code, and find the URL of 
the underlying image. But I still think that the best method is to maintain a database mapping short 
media names to their urls.
Comment 1 Brion Vibber 2009-03-30 21:49:56 UTC
This one's basically obsolete; we no longer do tarball dumps of images. :(

Future image dump access is more likely to be over rsync or other systems that pull individual files...

(Note that URLs for images *can* be derived from their name.)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links