Last modified: 2013-10-18 18:39:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17913, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15913 - Special:Import rejects valid XML and gives incorrect error
Special:Import rejects valid XML and gives incorrect error
Status: UNCONFIRMED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
1.13.x
All All
: Normal major (vote)
: ---
Assigned To: Ariel T. Glenn
: testme
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-09 09:44 UTC by Jani Patokallio
Modified: 2013-10-18 18:39 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Sample export showcasing the problem (152.81 KB, application/xml)
2008-10-09 09:44 UTC, Jani Patokallio
Details

Description Jani Patokallio 2008-10-09 09:44:50 UTC
Created attachment 5408 [details]
Sample export showcasing the problem

Exports from Wikitravel (1.11.2, export-0.3) contain a "realname" tag as follows:

<mediawiki>
  <page>
    <revision>
      <contributor>
        <realname>David</realname>
      </contributor>
      <comment> ... </comment>
      ...

(Sample export attached.)  This makes Special:Import go batshit insane:

IMPORT: FAILURE: Invalid tag <realname> in <contributor> WikiImporter XML error: Invalid tag <realname> in <contributor>
IMPORT: out_contributor realname
IMPORT: POP contributor
IMPORT: FAILURE: Expected </contributor>, got </realname> WikiImporter XML error: Expected </contributor>, got </realname>
IMPORT: out_contributor contributor
IMPORT: POP revision
IMPORT: PARENT page
IMPORT: in_page comment
IMPORT: FAILURE: Element <comment> not allowed in a <page>. WikiImporter XML error: Element <comment> not allowed in a <page>.

And it bails out with the (totally incorrect) error "All revisions were previously imported".  Removing the offending line fixes the problem, but there are still quite a few WTFs here:

1) XML specs mandate that the parser should ignore unknown tags, not "FAIL" on them.
2) Having that unknown tag should not cause it to incorrectly pop out of <revision> and then fail to read the rest of the file.
3) At the very least, it should abort on and properly display the error (fail-fast), instead of blindly proceeding and then giving the user the wrong error.  (In includes/specials/SpecialImport.php, any successCount = 0, that is, any failure at all, is logged as 'import-nonewrevisions'!)

The same problem also affects importDump.php, which is even worse, as it just cheerily reports "Done!" even though the import failed.
Comment 1 Brion Vibber 2008-12-03 20:01:07 UTC
1) XML specs mandate that the parser should ignore unknown tags, not "FAIL" on them.

Totally untrue. :) XML doesn't specify any such thing; that's up to the domain-specific markup language. (XML is a meta-language, not a language per se.)

However... it really ought to just ignore unknown tags, since that's the nice thing to do.

The failure mode sounds wrong too, and should get fixed if it's still doing that.
Comment 2 Elliott Eggleston 2013-09-16 23:20:06 UTC
I was able to import the attachment successfully using the latest code from git.  Perhaps this has been fixed sometime in the last 5 years?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links