Last modified: 2011-11-29 03:21:01 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T9789, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 7789 - RFE: provide listings for image dump tarballs


Summary:	RFE: provide listings for image dump tarballs

Status:	RESOLVED INVALID

Product:	Datasets
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All Windows XP

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2006-11-02 11:59 UTC by Michal Sevcenko
Modified:	2011-11-29 03:21 UTC (History)
CC List:	1 user (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Michal Sevcenko 2006-11-02 11:59:55 UTC

It would be great if you could provide listings of the image dumps, such as the two links below:

http://download.wikimedia.org/images/wikipedia/en/upload.tar
http://download.wikimedia.org/images/special/commons/20051126_upload.tar

Justification: I've written an experimental alternative wikipedia browser, which uses offline database 
created from wikipedia's XML dump. Since the media file collection is too large, I intended to delegate 
requests to media files from my browser to the original wikipedia servers. It is, unfortunately, not 
possible to derive the URL of a media file from its specification found in the article's wiki markup. 
For example, if the wiki markup contains a reference to image Stuy_building_cropped.jpg, the 
corresponding URL is http://upload.wikimedia.org/wikipedia/en/1/17/Stuy_building_cropped.jpg. The only 
way how to map short names to their urls is to analyze the listing of media file tarballs. I can create 
these listings myself, of course, but I need to download the tarballs, which would waste your and mine 
bandwidth.

Another rather desperate method is to download the media file's description page upon request, such as 
http://en.wikipedia.org/wiki/Image:Stuy_building_cropped.jpg, parse its HTML code, and find the URL of 
the underlying image. But I still think that the best method is to maintain a database mapping short 
media names to their urls.

Comment 1 Brion Vibber 2009-03-30 21:49:56 UTC

This one's basically obsolete; we no longer do tarball dumps of images. :(

Future image dump access is more likely to be over rsync or other systems that pull individual files...

(Note that URLs for images *can* be derived from their name.)

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links