Last modified: 2012-04-16 09:15:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30164, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 28164 - id attributes for Unicode code points start with "." and break validation
id attributes for Unicode code points start with "." and break validation
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
unspecified
All All
: Normal blocker (vote)
: ---
Assigned To: Nobody - You can work on this!
http://kn.wikipedia.org/w/index.php?t...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-21 20:00 UTC by M G Harish
Modified: 2012-04-16 09:15 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description M G Harish 2011-03-21 20:00:38 UTC
Consider the markup:

{|
| <h1>ಅ</h1>
|-
| ಆ
|-
|}

This is converted to HTML as:


<table>
<tbody><tr>
<td>
<h1><span id=".E0.B2.85" class="mw-headline">ಅ</span></h1>
</td>
</tr>
<tr>
<td>ಆ</td>
</tr>
</tbody></table>

Look at the <span> tag, which is not present in the actual wiki markup. The problem is that it is taking the Unicode code points of the text ಅ (Kannada letter A) in UTF-8 format, and is converting to value of the "id" attribute. This is breaking the XHTML validation of the texts. Why this extra <span> tag is introduced by MediaWiki? This is observed only for the heading tags <h1> to <h6>, and not for any other tag.

Thanks & Regards,
Harish
Comment 1 Mark A. Hershberger 2011-03-22 15:13:52 UTC
The extra span tag isn't what is messing up the validation, but, rather, the contents the id tag.

The span is there because "<h1>x</h1>" is identical to "= x ="
Comment 2 M G Harish 2011-03-22 16:24:20 UTC
Yeah, right. Why that id is needed?
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-03-23 01:03:49 UTC
Because the heading will be added to the table of contents, if there is one, and the table of contents will then link to it.  In theory we could avoid emitting the id if there's no TOC, but I don't see the gain.  It just reduces consistency.

I'm resolving WONTFIX because

1) This markup is valid in HTML5.  $wgHtml5 = false is still supported for now, but it won't be supported forever and it's not the default, so the motivation for fixing it is limited.

2) We don't want to break existing id's without good reason, and there's no good reason to break them in non-HTML5 mode if we're going to eventually remove support for it anyway and re-break them.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links