Last modified: 2013-10-18 18:39:15 UTC
Created attachment 5408 [details] Sample export showcasing the problem Exports from Wikitravel (1.11.2, export-0.3) contain a "realname" tag as follows: <mediawiki> <page> <revision> <contributor> <realname>David</realname> </contributor> <comment> ... </comment> ... (Sample export attached.) This makes Special:Import go batshit insane: IMPORT: FAILURE: Invalid tag <realname> in <contributor> WikiImporter XML error: Invalid tag <realname> in <contributor> IMPORT: out_contributor realname IMPORT: POP contributor IMPORT: FAILURE: Expected </contributor>, got </realname> WikiImporter XML error: Expected </contributor>, got </realname> IMPORT: out_contributor contributor IMPORT: POP revision IMPORT: PARENT page IMPORT: in_page comment IMPORT: FAILURE: Element <comment> not allowed in a <page>. WikiImporter XML error: Element <comment> not allowed in a <page>. And it bails out with the (totally incorrect) error "All revisions were previously imported". Removing the offending line fixes the problem, but there are still quite a few WTFs here: 1) XML specs mandate that the parser should ignore unknown tags, not "FAIL" on them. 2) Having that unknown tag should not cause it to incorrectly pop out of <revision> and then fail to read the rest of the file. 3) At the very least, it should abort on and properly display the error (fail-fast), instead of blindly proceeding and then giving the user the wrong error. (In includes/specials/SpecialImport.php, any successCount = 0, that is, any failure at all, is logged as 'import-nonewrevisions'!) The same problem also affects importDump.php, which is even worse, as it just cheerily reports "Done!" even though the import failed.
1) XML specs mandate that the parser should ignore unknown tags, not "FAIL" on them. Totally untrue. :) XML doesn't specify any such thing; that's up to the domain-specific markup language. (XML is a meta-language, not a language per se.) However... it really ought to just ignore unknown tags, since that's the nice thing to do. The failure mode sounds wrong too, and should get fixed if it's still doing that.
I was able to import the attachment successfully using the latest code from git. Perhaps this has been fixed sometime in the last 5 years?