Last modified: 2010-10-30 13:33:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 11710 - anchors with name= but no id=
anchors with name= but no id=
Status: RESOLVED WORKSFORME
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.12.x
All All
: Lowest trivial (vote)
: ---
Assigned To: Nobody - You can work on this!
http://zh.wikipedia.org/
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-19 21:23 UTC by Dan Jacobson
Modified: 2010-10-30 13:33 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dan Jacobson 2007-10-19 21:23:24 UTC
Why do ZH pages have name= links without the additional id=?
EN doesn't have that problem.

$ cat linktest
while read page
      do echo $page; lynx -source $page|grep name=|egrep -v id=\|keywords
done <<EOF
http://zh.wikipedia.org/wiki/Wikipedia_talk:%E8%81%9A%E4%BC%9A/2007%E8%87%BA%E7%81%A3%E7%A7%8B%E8%81%9A
http://en.wikipedia.org/wiki/Main_Page
http://en.wikipedia.org/wiki/User:Jidanni/Sandbox
http://zh.wikipedia.org/
EOF
$ sh linktest
http://zh.wikipedia.org/wiki/Wikipedia_talk:%E8%81%9A%E4%BC%9A/2007%E8%87%BA%E7%81%A3%E7%A7%8B%E8%81%9A
<p><a name=".E6.99.82.E9.96.93.E8.88.87.E5.9C.B0.E9.BB.9E"></a></p>
<p><a name=".E5.A0.B1.E5.90.8D"></a></p>
<p><a name=".E8.A8.8E.E8.AB.96"></a></p>
http://en.wikipedia.org/wiki/Main_Page
http://en.wikipedia.org/wiki/User:Jidanni/Sandbox
http://zh.wikipedia.org/
<th><a name=".E7.89.B9.E8.89.B2.E6.A2.9D.E7.9B.AE"></a>
<th><a name=".E6.96.B0.E9.97.BB.E5.8A.A8.E6.80.81"></a>
<th><a name=".E4.BC.98.E8.89.AF.E6.9D.A1.E7.9B.AE"></a>
<th><a name=".E6.AF.8F.E6.97.A5.E5.9B.BE.E7.89.87"></a>
<th><a name=".E4.BD.A0.E7.9F.A5.E9.81.93.E5.90.97.EF.BC.9F"></a>
<th><a name=".E5.8E.86.E5.8F.B2.E4.B8.8A.E7.9A.84.E4.BB.8A.E5.A4.A9"></a>
<th><a name=".E7.89.B9.E8.89.B2.E5.86.85.E5.AE.B9"></a>
<th><a name=".E5.AD.A3.E8.8A.82.E8.AF.9D.E9.A2.98"></a>
<th><a name=".E5.A7.8A.E5.A6.B9.E8.A8.88.E7.95.AB"></a>
Comment 1 Brion Vibber 2007-12-03 21:50:13 UTC
This is most likely due to zealous validation of id attributes. Those which begin with [A-Za-z] pass through, while those beginning with [.] are dropped. That may or may not be valid. (The "." is introduced as a variant of URL percent-encoding of characters which can't appear literally, modified to pass the strict, though rarely-enforced, rules about the contents of id attributes.)

Per compatibility guidelines in XHTML 1.0 spec:

"Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information."
[http://www.w3.org/TR/xhtml1/#guidelines]

This strict limitation is actually in the HTML 4.01 spec, as far as I can see, and would apply to both NAME and ID attributes...
[http://www.w3.org/TR/html4/types.html#h-6.2]

XHTML 1.0, as XML 1.0, allows a larger set of possibilities:
[5]  Name	   ::=   	(Letter | '_' | ':') (NameChar)*
[http://www.w3.org/TR/REC-xml/#sec-common-syn]

But both appear to technically disallow the initial "."...

Probably the fragment id normalization needs to produce something a bit different for those which don't have an initial ASCII letter... alternatively we could do some compatibility testing and see about using the wider Unicode-friendly XML selection... which still would have issues with digits and punctuation as the first character.
Comment 2 Derk-Jan Hartman 2010-10-30 13:33:08 UTC
It seems that atm, these headers all have ids (starting with .). The name attribute has been removed a while ago.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links