Last modified: 2008-10-09 16:30:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17497, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15497 - Spurious XML encoding declaration
Spurious XML encoding declaration
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
PC Windows XP
: Normal normal (vote)
: ---
Assigned To: Roan Kattouw
http://en.wikipedia.org/w/api.php?act...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-06 08:45 UTC by Christopher Yeleighton
Modified: 2008-10-09 16:30 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Christopher Yeleighton 2008-09-06 08:45:02 UTC
Steps to reproduce: 
Load the result into a HTML SCRIPT element, as follows:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
><HTML 
><HEAD 
><TITLE >MediaWiki XML encoding switching problem</TITLE 
><STYLE TYPE="TEXT/CSS" 
><!-- .ERROR { COLOR: RED } --></STYLE 
><SCRIPT ID="MWX"
TYPE="text/xml" 
SRC="http://en.wikipedia.org/w/api.php?action=query&amp;titles=Albert%20Einstein&amp;prop=info&amp;format=xml"
></SCRIPT ><SCRIPT TYPE="text/vbscript" ><!--
OPTION EXPLICIT

SUB WINDOW_ONLOAD
DIM A3DOC, A1X3DOC, L4ELTS, A4PARS3ERR
SET A3DOC = WINDOW. DOCUMENT
SET A1X3DOC = A3DOC. GETELEMENTBYID("MWX")
SET L4ELTS = A3DOC. FORMS. NAMEDITEM("MAIN"). ELEMENTS
L4ELTS. NAMEDITEM("FURL"). SETATTRIBUTE "value", A1X3DOC. SRC
SET A1X3DOC = A1X3DOC. XMLDOCUMENT
L4ELTS. NAMEDITEM("FXML"). SETATTRIBUTE "value", A1X3DOC. XML
SET A4PARS3ERR = A1X3DOC. PARSEERROR
L4ELTS. NAMEDITEM("FWHY"). SETATTRIBUTE "value", A4PARS3ERR. REASON
L4ELTS. NAMEDITEM("FWHERE"). SETATTRIBUTE "value", A4PARS3ERR. SRCTEXT
IF A4PARS3ERR THEN WINDOW. LOCATION. HREF = "#FWHY"
END SUB

REM --></SCRIPT ></HEAD 
><BODY 
><FORM ID="MAIN" ACTION="#MAIN" 
><FIELDSET CLASS="RESULT" 
><LEGEND >XML loaded</LEGEND 
><P 
>The document 
loaded from the <LABEL >URL <INPUT TYPE=TEXT ID=FURL READONLY > 
contains the following code: 
<TEXTAREA ID=FXML COLS=80 ROWS=25 READONLY ></TEXTAREA ></FIELDSET 
><FIELDSET CLASS="ERROR" ><LEGEND >XML not loaded</LEGEND 
><P >REASON: <BR ><TEXTAREA ID=FWHY COLS=80 READONLY ></TEXTAREA 
><P >SOURCE: <BR ><TEXTAREA ID=FWHERE COLS=80 REAONLY ></TEXTAREA 
></FORM ></BODY 
></HTML >

Expected results:
XML returned should display in a TEXTAREA.

Actual results:
Error: 
"Switch from current encoding to specified encoding not supported." 
at the XML declaration "<?xml version="1.0" encoding="utf-8"?>".

Affected systems:
Microsoft HTML engine. 

Diagnosis:
The error is explained at <http://msdn.microsoft.com/en-us/library/aa468560.aspx#xmlencod_topic3>).  When the XML processor does not load the XML text itself but it relies on an external mechanism to get it (MSHTML in this case), the downloading agent is allowed to recode the text but it is not obliged to convert or strip the encoding declaration.  As a result, the text presented to the XML engine has a different encoding than declared, causing the parser to fail.

Backround:
The encoding declaration is necessary only for documents that cannot be described otherwise.  Documents transported via HTTP have an encoding declaration in the HTTP headers.
Since the default encoding of XML is UTF-8, declaring this encoding has no effect or causes parsing errors.  There is no advantage whatsoever.

Recommendation:
Remove the encoding declaration.

Workarounds:
1. Use the XML extension element instead.
2. Use MSXML.DOMDocument directly from script.
Comment 1 Christopher Yeleighton 2008-09-06 08:51:40 UTC
Oops, I got the LABEL wrong.  Here is a correction:

><BODY 
><FORM ID="MAIN" ACTION="#MAIN" 
><FIELDSET CLASS="RESULT" 
><LEGEND >XML loaded</LEGEND 
><P 
>The document 
loaded from the <LABEL >URL <INPUT TYPE=TEXT ID=FURL READONLY ></LABEL > 
contains <LABEL >the following code: 
<TEXTAREA ID=FXML COLS=80 ROWS=25 READONLY ></TEXTAREA ></LABEL ></FIELDSET 
><FIELDSET CLASS="ERROR" ><LEGEND >XML not loaded</LEGEND 
><P 
><LABEL >REASON: <BR ><TEXTAREA ID=FWHY COLS=80 READONLY ></TEXTAREA ></LABEL
><P 
><LABEL >SOURCE: <BR ><TEXTAREA ID=FWHERE COLS=80 REAONLY ></TEXTAREA ></LABEL
></FORM ></BODY 
Comment 2 Roan Kattouw 2008-09-06 11:40:15 UTC
Can someone who actually knows this stuff confirm that changing

<?xml version="1.0" encoding="utf-8"?>

to

<?xml version="1.0" ?>

is OK?
Comment 3 Chad H. 2008-09-06 13:56:56 UTC
(In reply to comment #2)
> Can someone who actually knows this stuff confirm that changing
> 
> <?xml version="1.0" encoding="utf-8"?>
> 
> to
> 
> <?xml version="1.0" ?>
> 
> is OK?
> 

Seems to be the case. According to http://www.w3.org/TR/REC-xml/#charencoding (last couple of paragraphs of that section, really), the encodingdeclaration is _only_ required if you're not presenting utf-8, as utf-8 is the fallback.
Comment 4 Roan Kattouw 2008-09-10 13:39:43 UTC
Fixed in r40700.
Comment 5 Jani Patokallio 2008-10-09 16:30:49 UTC
As per http://www.w3.org/TR/REC-xml/#sec-TextDecl 4.3.1, "External parsed entities SHOULD each begin with a text declaration."  MediaWiki should follow W3C recommendations instead of bending over for Microsoft bugs.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links