Last modified: 2010-10-30 13:33:08 UTC
Why do ZH pages have name= links without the additional id=? EN doesn't have that problem. $ cat linktest while read page do echo $page; lynx -source $page|grep name=|egrep -v id=\|keywords done <<EOF http://zh.wikipedia.org/wiki/Wikipedia_talk:%E8%81%9A%E4%BC%9A/2007%E8%87%BA%E7%81%A3%E7%A7%8B%E8%81%9A http://en.wikipedia.org/wiki/Main_Page http://en.wikipedia.org/wiki/User:Jidanni/Sandbox http://zh.wikipedia.org/ EOF $ sh linktest http://zh.wikipedia.org/wiki/Wikipedia_talk:%E8%81%9A%E4%BC%9A/2007%E8%87%BA%E7%81%A3%E7%A7%8B%E8%81%9A <p><a name=".E6.99.82.E9.96.93.E8.88.87.E5.9C.B0.E9.BB.9E"></a></p> <p><a name=".E5.A0.B1.E5.90.8D"></a></p> <p><a name=".E8.A8.8E.E8.AB.96"></a></p> http://en.wikipedia.org/wiki/Main_Page http://en.wikipedia.org/wiki/User:Jidanni/Sandbox http://zh.wikipedia.org/ <th><a name=".E7.89.B9.E8.89.B2.E6.A2.9D.E7.9B.AE"></a> <th><a name=".E6.96.B0.E9.97.BB.E5.8A.A8.E6.80.81"></a> <th><a name=".E4.BC.98.E8.89.AF.E6.9D.A1.E7.9B.AE"></a> <th><a name=".E6.AF.8F.E6.97.A5.E5.9B.BE.E7.89.87"></a> <th><a name=".E4.BD.A0.E7.9F.A5.E9.81.93.E5.90.97.EF.BC.9F"></a> <th><a name=".E5.8E.86.E5.8F.B2.E4.B8.8A.E7.9A.84.E4.BB.8A.E5.A4.A9"></a> <th><a name=".E7.89.B9.E8.89.B2.E5.86.85.E5.AE.B9"></a> <th><a name=".E5.AD.A3.E8.8A.82.E8.AF.9D.E9.A2.98"></a> <th><a name=".E5.A7.8A.E5.A6.B9.E8.A8.88.E7.95.AB"></a>
This is most likely due to zealous validation of id attributes. Those which begin with [A-Za-z] pass through, while those beginning with [.] are dropped. That may or may not be valid. (The "." is introduced as a variant of URL percent-encoding of characters which can't appear literally, modified to pass the strict, though rarely-enforced, rules about the contents of id attributes.) Per compatibility guidelines in XHTML 1.0 spec: "Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information." [http://www.w3.org/TR/xhtml1/#guidelines] This strict limitation is actually in the HTML 4.01 spec, as far as I can see, and would apply to both NAME and ID attributes... [http://www.w3.org/TR/html4/types.html#h-6.2] XHTML 1.0, as XML 1.0, allows a larger set of possibilities: [5] Name ::= (Letter | '_' | ':') (NameChar)* [http://www.w3.org/TR/REC-xml/#sec-common-syn] But both appear to technically disallow the initial "."... Probably the fragment id normalization needs to produce something a bit different for those which don't have an initial ASCII letter... alternatively we could do some compatibility testing and see about using the wider Unicode-friendly XML selection... which still would have issues with digits and punctuation as the first character.
It seems that atm, these headers all have ids (starting with .). The name attribute has been removed a while ago.