Last modified: 2012-04-23 15:22:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T23200, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 21200 - dump format could declare which namespaces it covers
dump format could declare which namespaces it covers
Status: NEW
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-20 08:44 UTC by Andrew Dunbar
Modified: 2012-04-23 15:22 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andrew Dunbar 2009-10-20 08:44:21 UTC
The XML dump files released by Wikimedia contain a <namespaces> section which declares namespace names and numbers for the wiki it was dumped from.

But it does not tell you which of those namespaces are actually covered by the dump files.

For instance *-*-pages-articles.xml dumps do not contain any "Talk", "* talk", or "User" entries. Not even page title and redirect information.

This is fine but with wiki dumps now being produced in the same format also outside Wikimedia with different subsets of namespaces covered, such as http://devtionary.org/w/dump/xmlu/ the dump format is now an interchange format of sorts. So it would be nice if such information which is currently metadata external to the dump files could be made internal and self-contained. This could be quite useful to tools designed to process dump files.

Perhaps a new section of the dump files named <dumpinfo> could be added to complement the <siteinfo> section.
Comment 1 Christopher Sahnwaldt 2012-04-23 14:46:19 UTC
Similar to bug 34218 and bug 31955.
Comment 2 Christopher Sahnwaldt 2012-04-23 15:22:11 UTC
Similar to bug 36178.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links