Last modified: 2011-11-08 17:14:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T29653, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 27653 - Provide dumps using bittorrent
Provide dumps using bittorrent
Status: RESOLVED WONTFIX
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-02-23 08:43 UTC by Adam Wight
Modified: 2011-11-08 17:14 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Adam Wight 2011-02-23 08:43:03 UTC
Without citing stats, these huge files demand multisourcing, either over HTTP using mirrors, or even better, using bittorrent.  I hear this will dramatically improve bandwidth demand.

bittorrent is particularly nice, because files can be selectively downloaded from within the bundle.  You could provide a single torrent containing all outputs from a particular wiki snapshot date.
Comment 1 Sam Reed (reedy) 2011-02-23 10:55:57 UTC
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps

As for the BitTorrent part, that would be somewhat feasible, having the tracker on WMF, but seeding from might be more of an issue
Comment 2 Adam Wight 2011-02-23 17:29:50 UTC
This is not an area I know much about, but what is the objection to seeding?  I imagine you will get the maximum benefit by using an open tracker which is already tied into search services.  And if your mirrors agree to use this protocol, they would provide a natural pool of seeders, even before they have finished replicating.

One major down side of the torrent idea is that it would be inefficient to offer incomplete dumps, because the .torrent would have to be changed as data grows.  Unless there is a workaround, it would only make sense to wait until the dump is completed--by which point the data has aged...
Comment 3 Antoine "hashar" Musso (WMF) 2011-02-23 22:04:30 UTC
Rephrased subject.
Comment 4 Ariel T. Glenn 2011-09-18 06:58:41 UTC
Once the dump is available there is nothing preventing someone in the community or several someones from setting up a torrent of these files, and I encourage folks to do so (as has been done a number of times in the past).  

Waiting til the dump is completed before adding it to a torrent is a good idea in all cases; only then are we sure that the files are intact and worth your while to download.

Folks that have talked with us about setting up a mirror site have expressed a preference for rsync, and that works best for us for distributing a subset of the dumps for mirroring.
Comment 5 Antoine "hashar" Musso (WMF) 2011-11-08 17:14:19 UTC
Per Ariel comment I am closing this bug. Either set your own torrent or ask a rsync access.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links