Last modified: 2008-10-09 16:30:49 UTC
Steps to reproduce: Load the result into a HTML SCRIPT element, as follows: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ><HTML ><HEAD ><TITLE >MediaWiki XML encoding switching problem</TITLE ><STYLE TYPE="TEXT/CSS" ><!-- .ERROR { COLOR: RED } --></STYLE ><SCRIPT ID="MWX" TYPE="text/xml" SRC="http://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=xml" ></SCRIPT ><SCRIPT TYPE="text/vbscript" ><!-- OPTION EXPLICIT SUB WINDOW_ONLOAD DIM A3DOC, A1X3DOC, L4ELTS, A4PARS3ERR SET A3DOC = WINDOW. DOCUMENT SET A1X3DOC = A3DOC. GETELEMENTBYID("MWX") SET L4ELTS = A3DOC. FORMS. NAMEDITEM("MAIN"). ELEMENTS L4ELTS. NAMEDITEM("FURL"). SETATTRIBUTE "value", A1X3DOC. SRC SET A1X3DOC = A1X3DOC. XMLDOCUMENT L4ELTS. NAMEDITEM("FXML"). SETATTRIBUTE "value", A1X3DOC. XML SET A4PARS3ERR = A1X3DOC. PARSEERROR L4ELTS. NAMEDITEM("FWHY"). SETATTRIBUTE "value", A4PARS3ERR. REASON L4ELTS. NAMEDITEM("FWHERE"). SETATTRIBUTE "value", A4PARS3ERR. SRCTEXT IF A4PARS3ERR THEN WINDOW. LOCATION. HREF = "#FWHY" END SUB REM --></SCRIPT ></HEAD ><BODY ><FORM ID="MAIN" ACTION="#MAIN" ><FIELDSET CLASS="RESULT" ><LEGEND >XML loaded</LEGEND ><P >The document loaded from the <LABEL >URL <INPUT TYPE=TEXT ID=FURL READONLY > contains the following code: <TEXTAREA ID=FXML COLS=80 ROWS=25 READONLY ></TEXTAREA ></FIELDSET ><FIELDSET CLASS="ERROR" ><LEGEND >XML not loaded</LEGEND ><P >REASON: <BR ><TEXTAREA ID=FWHY COLS=80 READONLY ></TEXTAREA ><P >SOURCE: <BR ><TEXTAREA ID=FWHERE COLS=80 REAONLY ></TEXTAREA ></FORM ></BODY ></HTML > Expected results: XML returned should display in a TEXTAREA. Actual results: Error: "Switch from current encoding to specified encoding not supported." at the XML declaration "<?xml version="1.0" encoding="utf-8"?>". Affected systems: Microsoft HTML engine. Diagnosis: The error is explained at <http://msdn.microsoft.com/en-us/library/aa468560.aspx#xmlencod_topic3>). When the XML processor does not load the XML text itself but it relies on an external mechanism to get it (MSHTML in this case), the downloading agent is allowed to recode the text but it is not obliged to convert or strip the encoding declaration. As a result, the text presented to the XML engine has a different encoding than declared, causing the parser to fail. Backround: The encoding declaration is necessary only for documents that cannot be described otherwise. Documents transported via HTTP have an encoding declaration in the HTTP headers. Since the default encoding of XML is UTF-8, declaring this encoding has no effect or causes parsing errors. There is no advantage whatsoever. Recommendation: Remove the encoding declaration. Workarounds: 1. Use the XML extension element instead. 2. Use MSXML.DOMDocument directly from script.
Oops, I got the LABEL wrong. Here is a correction: ><BODY ><FORM ID="MAIN" ACTION="#MAIN" ><FIELDSET CLASS="RESULT" ><LEGEND >XML loaded</LEGEND ><P >The document loaded from the <LABEL >URL <INPUT TYPE=TEXT ID=FURL READONLY ></LABEL > contains <LABEL >the following code: <TEXTAREA ID=FXML COLS=80 ROWS=25 READONLY ></TEXTAREA ></LABEL ></FIELDSET ><FIELDSET CLASS="ERROR" ><LEGEND >XML not loaded</LEGEND ><P ><LABEL >REASON: <BR ><TEXTAREA ID=FWHY COLS=80 READONLY ></TEXTAREA ></LABEL ><P ><LABEL >SOURCE: <BR ><TEXTAREA ID=FWHERE COLS=80 REAONLY ></TEXTAREA ></LABEL ></FORM ></BODY
Can someone who actually knows this stuff confirm that changing <?xml version="1.0" encoding="utf-8"?> to <?xml version="1.0" ?> is OK?
(In reply to comment #2) > Can someone who actually knows this stuff confirm that changing > > <?xml version="1.0" encoding="utf-8"?> > > to > > <?xml version="1.0" ?> > > is OK? > Seems to be the case. According to http://www.w3.org/TR/REC-xml/#charencoding (last couple of paragraphs of that section, really), the encodingdeclaration is _only_ required if you're not presenting utf-8, as utf-8 is the fallback.
Fixed in r40700.
As per http://www.w3.org/TR/REC-xml/#sec-TextDecl 4.3.1, "External parsed entities SHOULD each begin with a text declaration." MediaWiki should follow W3C recommendations instead of bending over for Microsoft bugs.