Last modified: 2014-02-12 23:35:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46609, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44609 - Stop adding xml:lang attributes to HTML5 pages
Stop adding xml:lang attributes to HTML5 pages
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.21.x
All All
: Low trivial with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: html
  Show dependency treegraph
 
Reported: 2013-02-03 00:46 UTC by Michael Zajac
Modified: 2014-02-12 23:35 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michael Zajac 2013-02-03 00:46:48 UTC
Wikipedia and Wiktionary pages now have the HTML5 doctype &lt;!doctype html>, and a root &lt;html> element with only a lang tag. HTML5 doesn’t require the xml:lang attribute. According to the spec, “The attribute in no namespace with no prefix and with the literal local name "<code>xml:lang</code>" has no effect on language processing.”[http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-lang-and-xml:lang-attributes]

But if you enter, e.g., &lt;span lang="fr">fou&lt;span> into a wiki page, the Wikitext parser will add a redundant and vestigial xml:lang attribute.

The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.
Comment 1 Michael Zajac 2013-02-03 00:53:11 UTC
[I wish I could edit my bug, or at least preview. Have you guys heard of this “wiki” thing. Here’s a better-formatted version of my bug report.]

Wikipedia and Wiktionary pages now have the HTML5 doctype <!doctype html>, and a root <html> tag with only a lang attribute. HTML5 doesn’t require the xml:lang attribute. According to the spec, “The attribute in no namespace with no prefix and with the literal local name "xml:lang" has no effect on language processing.”

Source: http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#the-lang-and-xml:lang-attributes

But if you enter, e.g., <span lang="fr">fou<span> into a wiki page, the Wikitext parser will add a redundant and vestigial xml:lang attribute.

The parser should stop adding the xml:lang attribute in pages that are HTML5 and not XML.
Comment 2 Dennis C. During 2013-02-03 01:21:50 UTC
I agree. Let's keep things nice a tidy.
Comment 3 Kevin Israel (PleaseStand) 2013-02-03 01:53:01 UTC
This is caused by the "output-xhtml" option in includes/tidy.conf. Unfortunately, disabling it seems to break things such as the conversion from <hr> to <hr />, so many pages would no longer be well-formed XML as configured by $wgWellFormedXml.

Note that adding the extra attribute is legal according to HTML5 section 3.2.3.3:

"Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner."
Comment 4 Ran Ari-Gur (User:Ruakh on WMF projects) 2013-02-03 20:14:07 UTC
(In reply to comment #3)
> Unfortunately, [...] many pages would no longer be well-formed XML [...]

Why is that unfortunate? The pages are HTML, not XHTML (we're serving them as text/html, not as e.g. application/xhtml+xml), so there's no reason they *should* be well-formed XML. (See HTML5 section 1.6, or section 8.) The spec says that in the HTML syntax, the use of '/' on void elements (br, hr, img, etc.) is optional and has no effect. (See HTML5 section 8.1.2.1, clause 6.)

(That's as far as the standard is concerned. Obviously we also care about browser support, but personally I find it impossible to believe that any real-world browser would stumble over '<hr>' in an HTML document.)
Comment 5 Andre Klapper 2013-02-05 11:03:15 UTC
(In reply to comment #1)
> [I wish I could edit my bug, or at least preview. Have you guys heard of this
> “wiki” thing. Here’s a better-formatted version of my bug report.]

Offtopic: The "you guys" that you want to talk with can be reached here:
https://bugzilla.mozilla.org/show_bug.cgi?id=40896

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links