Last modified: 2011-04-14 15:12:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17914, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15914 - Special:Export output missing XML encoding
Special:Export output missing XML encoding
Status: NEW
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
1.13.x
All All
: Low minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-09 09:50 UTC by Jani Patokallio
Modified: 2011-04-14 15:12 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jani Patokallio 2008-10-09 09:50:34 UTC
XML files produced by Special:Export are UTF-8, but this encoding is not indicated in the file.  This violates the XML spec, and causes many XML manipulation programs like xmlstarlet to trash any special characters inside.

Trivially fixed by prepending this to all exported files:

<?xml version="1.0" encoding="UTF-8" ?>
Comment 1 Chad H. 2008-10-09 15:27:10 UTC
cf bug 15497

We removed it from the API's output because the W3C (seems) to say that the encoding type doesn't need to be declared on utf-8 output (as it's the default). Rather, it only needs to be added if outputting non-utf-8 data.

The link in question: http://www.w3.org/TR/REC-xml/#charencoding
Comment 2 Jani Patokallio 2008-10-09 16:22:51 UTC
It may not be mandatory, but according to 4.3.1, "External parsed entities SHOULD each begin with a text declaration."  Why would you want to remove it?

Comment 3 Roan Kattouw 2008-10-09 16:29:34 UTC
(In reply to comment #2)
> It may not be mandatory, but according to 4.3.1, "External parsed entities
> SHOULD each begin with a text declaration."  Why would you want to remove it?
> 

Someone complained about issues caused by the encoding declaration and pointed out that it could be removed safely, quoting the w3c link in comment #1. I didn't see any harm in it, so I removed it.
Comment 4 Brion Vibber 2009-07-20 02:50:29 UTC
What are the issues that warranted removal?
Comment 5 Roan Kattouw 2009-07-25 15:23:11 UTC
(In reply to comment #4)
> What are the issues that warranted removal?
> 

I have no idea, this was ages ago; I searched through the mailing list archives, but found nothing (in topic titles that is; can't search the message contents themselves in a convenient way).
Comment 6 Niklas Laxström 2009-07-25 16:29:59 UTC
(In reply to comment #0)
> causes many XML
> manipulation programs like xmlstarlet to trash any special characters inside.

To me it looks that many XML manipulation programs are broken and should be fixed, regardless of whether we output something that merely states the obvious (UTF-8 is the default encoding if nothing else is specified).
Comment 7 Roan Kattouw 2009-07-25 18:07:14 UTC
(In reply to comment #6)
> (In reply to comment #0)
> > causes many XML
> > manipulation programs like xmlstarlet to trash any special characters inside.
> 
> To me it looks that many XML manipulation programs are broken and should be
> fixed, regardless of whether we output something that merely states the obvious
> (UTF-8 is the default encoding if nothing else is specified).
> 

It seems that there are both XML parsers that mess up stuff when utf-8 is *not* specified, and those that mess up stuff when it *is* specified (which was probably why I was asked to remove it in the first place). If that is the case, there's not really anything we can do to appease both compliant and non-compliant parsers, like we could with the xml:space="preserve" issue.
Comment 8 Jani Patokallio 2009-07-27 07:05:32 UTC
IMHO, a parser that incorrectly handles an explicitly declared encoding is more broken than one that uses an incorrect default for a file with no encoding.  As quoted above, the XML spec says "External parsed entities SHOULD each begin with a text declaration", so declaring the encoding is the correct thing to do.



Comment 9 Niklas Laxström 2009-07-27 07:43:48 UTC
(In reply to comment #8)
> IMHO, a parser that incorrectly handles an explicitly declared encoding is more
> broken than one that uses an incorrect default for a file with no encoding.  As
> quoted above, the XML spec says "External parsed entities SHOULD each begin
> with a text declaration", so declaring the encoding is the correct thing to do.

I fail to see why files produced by Special:Export should be considered as external parsed entities.

Comment 10 Vitaliy Filippov 2010-03-18 14:23:14 UTC
See also Bug 22881 - Greatly improved Export and Import for 1.14.1 (with support for advanced page selection, exporting and importing file uploads, and detection of "conflicts" during import). There's a patch written by me which is related to or fixes your issue.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links