Last modified: 2010-05-15 15:37:53 UTC
In the XML dumps, namespaces are stored as prefixes in page titles. What happens if new namespaces are defined and someone imports a dump to a MediaWiki installation that doesn't know them yet? The pages will land in the article namespace. The same applies if the languages of the two MediaWiki installations (i.e. where the dump was exported and where it is imported) don't match or if dumps are made before new namespace names are translated (or after they are translated, if the MediaWiki installation to which the dump is imported, doesn't have the translated name yet). The situation isn't better for people who want to write scripts that analyse dumps of various wikis. E.g. if I'd like to know how many user pages exist in different Wikipedias, I must tell the script all translations of "user". All these problems can be avoided if the namespaces are given as numbers. And to avoid redundancy, the prefixes should then be omitted, IMO. So e.g. instead of <title>Talk:Wikipedia</title> this would be better: <title namespace="1">Wikipedia</title> As an alternative (or maybe even in addition), a translation of namespace names can be put at the beginning of the XML dump. That would look like this: <namespaces> <namespace id="0" /> <namespace id="1">Talk</namespace> ... </namespaces> *Please* do something before the first dumps in XML format are made, if it isn't too late, because people (including me) will hate it if the dump format changes all the time.
Using namespace text is deliberate, so that custom namespaces don't just fall into neverneverland, but will be imported with names intact. It would, however, be very good to include a list of namespace definitions and some other config info at the top of the dump. This would make it quite easy to split titles by namespace and get the symbolic namespaces during the import.
Done w/ version 0.3 of export schema.
See also bug 3143.
*** Bug 3143 has been marked as a duplicate of this bug. ***