Last modified: 2014-11-17 09:21:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72385, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70385 - Wikidata JSON dump: file directory location should follow standard patterns
Wikidata JSON dump: file directory location should follow standard patterns
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Ariel T. Glenn
u=dev c=infrastructure p=0
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-04 07:21 UTC by Markus Krötzsch
Modified: 2014-11-17 09:21 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Markus Krötzsch 2014-09-04 07:21:27 UTC
The Wikidata JSON dump is currently located at

http://dumps.wikimedia.org/other/wikidata/

This does not follow the common scheme used by all other dumps. For example, the daily (incremental) dumps are at the location

http://dumps.wikimedia.org/other/incr/wikidatawiki/

Here "incr" specifies is the type of dump, and "wikidatawiki" is the official Wikimedia site name of Wikidata.org. The current scheme uses a custom string name ("wikidata") that is not a site name, and it completely fails to specify the dump type. If more projects would generate JSON dumps (e.g., a future Wikimedia Commons installation of Wikibase), then this naming pattern will not work.

I suggest to use a location like:

http://dumps.wikimedia.org/other/wikibase-json/wikidatawiki/

Or maybe use "json" if you find this specific enough. While doing this, the file names should also be made more descriptive (Bug 68792).
Comment 1 Markus Krötzsch 2014-09-04 07:29:40 UTC
In addition to the above, there should be a timestamp-based sub-directory for each export (even if it would contain only one file for now). For example, the daily dumps are in directories like

http://dumps.wikimedia.org/other/incr/wikidatawiki/20140903/

Using the same structure will make it easier for consumers to find dump files without needing custom code for each type of dump (a program that checks the Web to find out for which dates there are dumps could use the same code for all types of "other" dumps). Moreover, it might be good to have a directory per dump to organise multiple files in the future (md5 sum, several types of compression [Bug 68793], dump status).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links