Last modified: 2014-02-12 23:35:41 UTC
Wikipedia and Wiktionary pages now have the HTML5 doctype <!doctype html>, and a root <html> element with only a lang tag. HTML5 doesn’t require the xml:lang attribute. According to the spec, “The attribute in no namespace with no prefix and with the literal local name "<code>xml:lang</code>" has no effect on language processing.”[http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-lang-and-xml:lang-attributes] But if you enter, e.g., <span lang="fr">fou<span> into a wiki page, the Wikitext parser will add a redundant and vestigial xml:lang attribute. The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.
[I wish I could edit my bug, or at least preview. Have you guys heard of this “wiki” thing. Here’s a better-formatted version of my bug report.] Wikipedia and Wiktionary pages now have the HTML5 doctype <!doctype html>, and a root <html> tag with only a lang attribute. HTML5 doesn’t require the xml:lang attribute. According to the spec, “The attribute in no namespace with no prefix and with the literal local name "xml:lang" has no effect on language processing.” Source: http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-lang-and-xml:lang-attributes But if you enter, e.g., <span lang="fr">fou<span> into a wiki page, the Wikitext parser will add a redundant and vestigial xml:lang attribute. The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.
I agree. Let's keep things nice a tidy.
This is caused by the "output-xhtml" option in includes/tidy.conf. Unfortunately, disabling it seems to break things such as the conversion from <hr> to <hr />, so many pages would no longer be well-formed XML as configured by $wgWellFormedXml. Note that adding the extra attribute is legal according to HTML5 section 3.2.3.3: "Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner."
(In reply to comment #3) > Unfortunately, [...] many pages would no longer be well-formed XML [...] Why is that unfortunate? The pages are HTML, not XHTML (we're serving them as text/html, not as e.g. application/xhtml+xml), so there's no reason they *should* be well-formed XML. (See HTML5 section 1.6, or section 8.) The spec says that in the HTML syntax, the use of '/' on void elements (br, hr, img, etc.) is optional and has no effect. (See HTML5 section 8.1.2.1, clause 6.) (That's as far as the standard is concerned. Obviously we also care about browser support, but personally I find it impossible to believe that any real-world browser would stumble over '<hr>' in an HTML document.)
(In reply to comment #1) > [I wish I could edit my bug, or at least preview. Have you guys heard of this > “wiki” thing. Here’s a better-formatted version of my bug report.] Offtopic: The "you guys" that you want to talk with can be reached here: https://bugzilla.mozilla.org/show_bug.cgi?id=40896