Last modified: 2014-02-12 23:35:48 UTC
As long as HTML5 has been rolled out, data attributes are coming into our houses, and into Wikimedia projects too (at least, some people desire them). There are some urlencoding magic words, but there is no specific magic words for HTML attributes. Should we have one, or if not, what can you suggest to use instead? I'm not very confident with what should be escaped so that we don't worry much about its security. If no one is interested in _coding_, I can assign this to myself, but I'd like some security guidance (like this: <http://wonko.com/post/html-escaping>).
This is probably the best list: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#XSS_Prevention_Rules_Summary This is general, so in our specific case we could probably get away with a different list. But this should probably be your target.
This particular bug is orthogonal to which attributes to allow. As far as escaping, that page says: "HTML Attribute Encoding Except for alphanumeric characters, escape all characters with the HTML Entity &#xHH; format, including spaces. (HH = Hex Value)" That's more escaping then I would expect but I have no reason to doubt it, and it shouldn't cause problems.