Last modified: 2014-05-06 21:30:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30776, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 28776 - Whitelist global HTML5 semantic attributes and inline meta element
Whitelist global HTML5 semantic attributes and inline meta element
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Normal enhancement with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-05-02 02:48 UTC by Brett Zamir
Modified: 2014-05-06 21:30 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brett Zamir 2011-05-02 02:48:46 UTC
Could we get the global microdata ( http://www.w3.org/TR/html5/microdata.html ) attributes, @itemscope, @itemid, @itemtype, @itemprop, and @itemref whitelisted, as well as allow meta tags ( http://www.w3.org/TR/html5/semantics.html#meta ), now allowable in the body of a document with HTML5, with @name and @content attributes (in addition to the global ones just mentioned available on all elements)? 

The HTML5 spec even specifies a (Mediawiki) wiki for making official extensions (at http://wiki.whatwg.org/wiki/MetaExtensions ), so extensions could become "standard extensions" in convenient wiki fashion (while allowing @itemtype to indicate namespaced extensions), in cases where the Wikimedia community wished to reuse a particular meta property.

I should point out too that microdata is not only something which those using custom client-side jQuery or XQuery or one-off server-side parsers can take advantage of--it is already implemented by prominent crawlers such as Google: http://www.google.com/webmasters/tools/richsnippets (see also http://googlewebmastercentral.blogspot.com/2010/03/microdata-support-for-rich-snippets.html ).
Comment 1 Brett Zamir 2011-06-05 14:14:55 UTC
Also allowing <link/>'s in the body for expressing meta-data (as with in-body <meta/> mentioned earlier)
Comment 2 Bawolff (Brian Wolff) 2011-06-12 02:34:24 UTC
For the original issue (microdata), that's already available as an option ( set $wgAllowMicrodataAttributes = true; in LocalSettins.php). If i recall, the reason that its not enabled by default is that there is some concern that once its enabled, we'll never be able to un-enable it [since disabling it would than brake user content], so we want to make sure we really want to whitelist those elements before actually whitelisting. (Don't quote me on that though, could be wrong on reason, this is from half remembered mailing list threads).


As for <link> and <meta> in body - Well first off, whitelisting them totally is probably a bad idea - certain link elements can be dangerous (<link rel="stylesheet" ...> can be used to load js, <meta http-equiv="refresh" ...>, is also evil, etc). In order to get them [or the safe parts] whitelisted, it'd probably help to provide concrete use-cases where the tags would be useful (Just going from experience on other bugs where people wanted things whitelisted, concrete examples go a long way)
Comment 3 Brett Zamir 2011-06-12 05:46:53 UTC
Hi,

@Bawolff: Thanks very much for the info on enabling microdata attributes.

Although HTML5 may technically still be an in progress specification, Microdata has been endorsed by Google, Microsoft, and Yahoo, though http://schema.org . My feeling is that the time has come for higher-level semantics to be made available, especially for sites like Wikisource which could particularly benefit from allowing richly semantic markup, such as we are now discussing in preparing an HTML5 Microdata serialization for TEI (Text Encoding Initiative).

The Microdata specification (at http://www.w3.org/TR/html5/microdata.html ) demonstrates <link/> being used with @itemprop and @href and http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#the-link-element explains that a <link/> will not be treated as a link without @rel, but the element can exist without @rel using just @itemprop. So, perhaps disallowing @rel (and probably @type) would then be sufficient to limit this tag to behave as a purely semantic information. 

Likewise, for in-body meta tags, one only needs to whitelist @itemprop and @content (and ideally harmless global attributes like @id, @title, and @lang).

The following is a sample of a proposed serialization approach for TEI (a language used in the academic world for marking up classical literature, and which I think ought to be allowable on the likes of Wikisource). The following is adapted from code within the first example at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-note.html , preserving all of the semantics for round-tripping.

<aside itemprop="note">
  <meta itemprop="place" content="bottom"/>
  <meta itemprop="type" content="gloss"/>
  <link itemprop="resp" href="#MDMH"/>
  <dfn xml:lang="de" lang="de" itemprop="term">Malerisch</dfn>. This word has, in the German, two
 distinct meanings, one objective, a quality residing in the object,
 the other subjective, a mode of apprehension and creation. To avoid
 confusion, they have been distinguished in English as
<span itemprop="mentioned">picturesque</span> and
<span itemprop="mentioned">painterly</span> respectively.
</aside>

<div itemprop="respStmt" id="MDMH" style="display:none;">
 <div itemprop="resp">translation from German to English</div>
 <div itemprop="name">Hottinger, Marie Donald Mackie</div>
</div>

Note that <meta/> is here used to reflect attributes from TEI (e.g., to indicate that this note is a gloss), while <link/> is used to reference additional hidden meta-data (in this case, information about who is responsible for the note). <link/> might also be used for the likes of TEI's <ptr/> elment which can indicate a relationship to one or more targets from and to anywhere (including optionally using XPointer to indicate the semantic relationship) but which need not be visible. This is one way to allow for example, what TEI calls stand-off markup: the ability to reference a text (e.g., a famous authoritative work) from another text (e.g., a commentary).
Comment 4 Bawolff (Brian Wolff) 2011-06-13 19:03:47 UTC
You may be interested in this mailing list post from earlier this month: http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053720.html

I imagine the scope of this bug has shifted to be enable $wgAllowMicrodataAttributes by default.


Anyways, cc'ing Aryeh Gregor on this bug since he knows all about this stuff.
Comment 5 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-06-13 20:06:41 UTC
(In reply to comment #0)
> Could we get the global microdata ( http://www.w3.org/TR/html5/microdata.html )
> attributes, @itemscope, @itemid, @itemtype, @itemprop, and @itemref
> whitelisted

This just requires turning $wgAllowMicrodataAttributes on.  Note that $wgHtml5 must also be on for it to do anything.

> as well as allow meta tags (
> http://www.w3.org/TR/html5/semantics.html#meta ), now allowable in the body of
> a document with HTML5, with @name and @content attributes (in addition to the
> global ones just mentioned available on all elements)? 

<link> and <meta> can only be used in the body if itemprop is specified.  In those cases, we should whitelist them (assuming microdata is enabled) but currently don't.  It should be pretty easy to add support to Sanitizer.php.  I could probably do this if microdata is actually going to be enabled by default, especially if it's going to be enabled on Wikimedia wikis.  As long as it happens soon -- I'll be unavailable starting a few months from now.

> The HTML5 spec even specifies a (Mediawiki) wiki for making official extensions
> (at http://wiki.whatwg.org/wiki/MetaExtensions ), so extensions could become
> "standard extensions" in convenient wiki fashion (while allowing @itemtype to
> indicate namespaced extensions), in cases where the Wikimedia community wished
> to reuse a particular meta property.

That's only for <meta name="">, which is only allowed in the head, so it's not relevant to us.  The spec mandates no central repository for microdata vocabularies.

> I should point out too that microdata is not only something which those using
> custom client-side jQuery or XQuery or one-off server-side parsers can take
> advantage of--it is already implemented by prominent crawlers such as Google:
> http://www.google.com/webmasters/tools/richsnippets (see also
> http://googlewebmastercentral.blogspot.com/2010/03/microdata-support-for-rich-snippets.html
> ).

And Bing and Yahoo!, yes.  I think it should be enabled by default, but I'm not going to do it unless I get the okay from someone in charge, since previously there was disagreement about it.
Comment 6 matanya 2012-08-12 00:22:00 UTC
partly fixed by https://gerrit.wikimedia.org/r/#/c/4251/
Comment 7 Daniel Friesen 2013-06-03 01:58:48 UTC
We have an option to enable all the microdata markup. <meta> and <link> are supported.

Is there anything left missing for this bug? It looks like it's already FIXED.
Comment 8 Brett Zamir 2013-06-04 22:56:31 UTC
If attributes, and meta/link are now supported, it sounds good by me... Thanks!
Comment 9 Bawolff (Brian Wolff) 2013-06-05 13:34:32 UTC
(In reply to comment #7)
> We have an option to enable all the microdata markup. <meta> and <link> are
> supported.
> 
> Is there anything left missing for this bug? It looks like it's already
> FIXED.

Is there any reason not to have them on by default? I imagine such properties could be useful for people making infoboxes or license templates.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links