Last modified: 2010-10-30 13:33:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T13710, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 11710 - anchors with name= but no id=


Summary:	anchors with name= but no id=

Status:	RESOLVED WORKSFORME

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	1.12.x
Hardware:	All All

Importance:	Lowest trivial (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://zh.wikipedia.org/
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-10-19 21:23 UTC by Dan Jacobson
Modified:	2010-10-30 13:33 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Dan Jacobson 2007-10-19 21:23:24 UTC

Why do ZH pages have name= links without the additional id=?
EN doesn't have that problem.

$ cat linktest
while read page
      do echo $page; lynx -source $page|grep name=|egrep -v id=\|keywords
done <<EOF
http://zh.wikipedia.org/wiki/Wikipedia_talk:%E8%81%9A%E4%BC%9A/2007%E8%87%BA%E7%81%A3%E7%A7%8B%E8%81%9A
http://en.wikipedia.org/wiki/Main_Page
http://en.wikipedia.org/wiki/User:Jidanni/Sandbox
http://zh.wikipedia.org/
EOF
$ sh linktest
http://zh.wikipedia.org/wiki/Wikipedia_talk:%E8%81%9A%E4%BC%9A/2007%E8%87%BA%E7%81%A3%E7%A7%8B%E8%81%9A
<p><a name=".E6.99.82.E9.96.93.E8.88.87.E5.9C.B0.E9.BB.9E"></a></p>
<p><a name=".E5.A0.B1.E5.90.8D"></a></p>
<p><a name=".E8.A8.8E.E8.AB.96"></a></p>
http://en.wikipedia.org/wiki/Main_Page
http://en.wikipedia.org/wiki/User:Jidanni/Sandbox
http://zh.wikipedia.org/
<th><a name=".E7.89.B9.E8.89.B2.E6.A2.9D.E7.9B.AE"></a>
<th><a name=".E6.96.B0.E9.97.BB.E5.8A.A8.E6.80.81"></a>
<th><a name=".E4.BC.98.E8.89.AF.E6.9D.A1.E7.9B.AE"></a>
<th><a name=".E6.AF.8F.E6.97.A5.E5.9B.BE.E7.89.87"></a>
<th><a name=".E4.BD.A0.E7.9F.A5.E9.81.93.E5.90.97.EF.BC.9F"></a>
<th><a name=".E5.8E.86.E5.8F.B2.E4.B8.8A.E7.9A.84.E4.BB.8A.E5.A4.A9"></a>
<th><a name=".E7.89.B9.E8.89.B2.E5.86.85.E5.AE.B9"></a>
<th><a name=".E5.AD.A3.E8.8A.82.E8.AF.9D.E9.A2.98"></a>
<th><a name=".E5.A7.8A.E5.A6.B9.E8.A8.88.E7.95.AB"></a>

Comment 1 Brion Vibber 2007-12-03 21:50:13 UTC

This is most likely due to zealous validation of id attributes. Those which begin with [A-Za-z] pass through, while those beginning with [.] are dropped. That may or may not be valid. (The "." is introduced as a variant of URL percent-encoding of characters which can't appear literally, modified to pass the strict, though rarely-enforced, rules about the contents of id attributes.)

Per compatibility guidelines in XHTML 1.0 spec:

"Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information."
[http://www.w3.org/TR/xhtml1/#guidelines]

This strict limitation is actually in the HTML 4.01 spec, as far as I can see, and would apply to both NAME and ID attributes...
[http://www.w3.org/TR/html4/types.html#h-6.2]

XHTML 1.0, as XML 1.0, allows a larger set of possibilities:
[5]  Name	   ::=   	(Letter | '_' | ':') (NameChar)*
[http://www.w3.org/TR/REC-xml/#sec-common-syn]

But both appear to technically disallow the initial "."...

Probably the fragment id normalization needs to produce something a bit different for those which don't have an initial ASCII letter... alternatively we could do some compatibility testing and see about using the wider Unicode-friendly XML selection... which still would have issues with digits and punctuation as the first character.

Comment 2 Derk-Jan Hartman 2010-10-30 13:33:08 UTC

It seems that atm, these headers all have ids (starting with .). The name attribute has been removed a while ago.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links