Last modified: 2008-03-13 06:17:49 UTC
Sorry for this! Hallo! a) I tested character normalisation which seams part of title normalisation. Regarding precombined characters - NON-precombined characters this workes fine: [[User:Gangleri/tests/אָ]] http://yi.wikipedia.org/wiki/User:Gangleri/tests/%EF%AC%AF http://yi.wikipedia.org/wiki/User:Gangleri/tests/%D7%90%D6%B8 point to the same page despite different coding. b) The bug's URL will list four different pages with "identical optical title". There are "phantom" trailing general punctation characters generating different URL's. Compare: http://www.fileformat.info/info/unicode/char/202b/index.htm Unicode Character 'RIGHT-TO-LEFT EMBEDDING' (U+202B) UTF-8 (hex) 0xE2 0x80 0xAB (e280ab) http://homepage1.nifty.com/nomenclator/unicode/data/punct.htm The generated URL's are: http://yi.wikipedia.org/wiki/User:Gangleri/tests/%E2%80%AB%D7%B0%D7%99%D7%A5 http://yi.wikipedia.org/wiki/User:Gangleri/tests/%E2%80%AB%D7%B0%D7%99%D7%A5%E2%80%AB http://yi.wikipedia.org/wiki/User:Gangleri/tests/%E2%80%AB%D7%B0%D7%99%D7%A5%E2%80%AB%E2%80%AB http://yi.wikipedia.org/wiki/User:Gangleri/tests/%E2%80%AB%D7%B0%D7%99%D7%A5%E2%80%AB%E2%80%AB%E2%80%AB There are many aspects to this: a) possible vandalism - suggestion: Please evaluate if "phantom" = unnecessary heading or trailing punctuation should be stripped from database titles ++ this looks like a normalisation b) garbage in - garbage out Regards Reinhardt [[user:gangleri]] P.S. I run into this because of textual ambiguosities at Wikipedia in Yiddish relating to the usage of "tsvey vovn" versus "vov + vov", "tsvey-yudn": versus "yud + yud" etc. example 1: There is an article [[yi:וויץ]] but not [[yi:װיץ]] . example 2: http://www.yiddishdictionaryonline.com/ contains "vey iz (tsu) mir" which is written *there* both with "vov + vov" and "yud + yud". Nevertheless http://www.cs.engr.uky.edu/~raphael/yiddish/makeyiddish.html translates with "tsvey vovn" and "tsvey-yudn": װײ איז (צו) מיר! It seems that automatical character substitution is not possible because of ambiguasities when three characters meet together as in http://www.yiddishdictionaryonline.com/ at farvunderung - פֿאַרווונדערונג , "farvundert" - פֿאַרווונדערט and the other way around at oyspruvn - אויספּרווון
You will find typical examples at the end of http://yi.wiktionary.org/wiki/Special:Allpages and at http://yi.wiktionary.org/w/index.php?title=Category:Bugzilla . Summary is available at http://yi.wiktionary.org/wiki/%E2%80%AB . These pages where created because I have "compiled" the titles with "copy and paste" (of hebrew characters) between different Firefox browsers on Windows. A workaround is to use an usefull keyboard as described at http://www.uyip.org/ and avoid this silly "copy and pasts". See http://www.geocities.com/fontboard/yiddish.html : Yiddish Pasekh and Keyman keyboard for Windows Regards Reinhardt [[user:gangleri]]
Note: This bug can cause some confusion in a wiki. I assume that many contributors are using "copy and paste" to insert a few hebrew characters. As you can see from http://yi.wikipedia.org/wiki/User:Gangleri/tests/%E2%80%AB%D7%B0%D7%99%D7%A5%E2%80%AB %E2%80%AB can be - at the begining of a title - at the end of a title - (I assume also inside the title) There would be different things to do: - avoid generation of such titles during editing, linking etc. - clear the database - this is a maintenance issue Regards Reinhardt [[user:gangleri]]
additions: I found more incorect titles (only with heading RIGHT-TO-LEFT_EMBEDDING) in other projects with http://yi.wikipedia.org/wiki/Special:Prefixindex/%E2%80%AB beside http://yi.wiktionary.org/wiki/Special:Prefixindex/%E2%80%AB Beside RTL wiki's [[ar:]] [[fa:]] [[he:]] [[ur:]] [[yi:]] their wiktioaries etc. all other projects can be affected. These wrong titles at [[yi:]] have been created by 5 contributors. This shows that it is a general problem. If contributors use "copy" from a web page and copy it (as hebrew characters) into the URL from the browser (I use mainly Firefox myself) they might copy / paste leading trailing punctuation characters and the browser will *generate* these URL's. Of course this is not the proper way to generate titles (one should use a keyboard) and might be a Firefox issue (I do not know if it is reported at bugzilla.org if not please do so) or not but is common praxis of a signifficant amount of contributors to RTL projects. You will find the affected titles at: [[yi:Category:Bugzilla/Unicode_character_RIGHT-TO-LEFT_EMBEDDING_-_U_202B]] http://yi.wiktionary.org/wiki/Category:Bugzilla/Unicode_character_RIGHT-TO-LEFT_EMBEDDING_-_U_202B Best regards Reinhardt [[user:gangleri]]
more characters: I found http://yi.wikipedia.org/w/index.php?title=%E2%80%AB%D7%A7%D7%94%D7%9C_%D7%A4%D6%BF%D7%95%D7%9F_%E2%80%AB%D7%96%D7%A2%D7%9C%D7%91%D7%A9%D7%98%D7%A2%D7%A0%D7%93%D7%99%D7%A7%D7%A2%D7%A8_%D7%A9%D7%98%D7%90%D6%B7%D7%98%D7%9F%E2%80%AC&redirect=no which contained originaty a trailing %E2%80%AC Beside http://www.fileformat.info/info/unicode/char/202b/index.htm Unicode Character 'RIGHT-TO-LEFT EMBEDDING' (U+202B) UTF-8 (hex) 0xE2 0x80 0xAB (e280ab) Compare also: http://www.fileformat.info/info/unicode/char/202a/index.htm Unicode Character 'LEFT-TO-RIGHT EMBEDDING' (U+202A) UTF-8 (hex) 0xE2 0x80 0xAA (e280aa) http://www.fileformat.info/info/unicode/char/202c/index.htm Unicode Character 'POP DIRECTIONAL FORMATTING' (U+202C) UTF-8 (hex) 0xE2 0x80 0xAC (e280ac) http://www.fileformat.info/info/unicode/char/202d/index.htm Unicode Character 'LEFT-TO-RIGHT OVERRIDE' (U+202D) UTF-8 (hex) 0xE2 0x80 0xAD (e280ad) http://www.fileformat.info/info/unicode/char/202e/index.htm Unicode Character 'RIGHT-TO-LEFT OVERRIDE' (U+202E) UTF-8 (hex) 0xE2 0x80 0xAD (e280ae) Variations / modifications of http://yi.wikipedia.org/wiki/Special:Prefixindex/%E2%80%AB as http://yi.wikipedia.org/wiki/Special:Prefixindex/%E2%80%AA http://yi.wikipedia.org/wiki/Special:Prefixindex/%E2%80%AC http://yi.wikipedia.org/wiki/Special:Prefixindex/%E2%80%AD http://yi.wikipedia.org/wiki/Special:Prefixindex/%E2%80%AE are of limited use only because (theoreticaly) these characters can be included anywhere in a title. I will open another enhancement request about a special page alowing to instring search of titles specifying %nn values.
(In reply to comment #4) > I will open another enhancement request about a special page alowing to instring > search of titles specifying %nn values. bug 3887: create a special page for instring search of titles specifying %nn values
sorry for this see http://yi.wikipedia.org/wiki/%E2%80%AEtest http://yi.wiktionary.org/wiki/%E2%80%AEtest You may say: "garbague in garbague out" But this seams to be a subsequent error. It "seams" to interfear with setup about case sensitive / non case sensitive titles. The earlier this bug gets fixed the less subsequent errors we get.
sorry for this http://yi.wiktionary.org/wiki/Special:Whatlinkshere/%E2%80%AB%D7%B0%D7%90%D6%B8%D7%9B%D7%A0%D7%98%D7%90%D6%B8%D7%92 this title is invalid because it starts with %E2%80%AB = Unicode Character 'RIGHT-TO-LEFT EMBEDDING' (U+202B) However it is a mess editing BiDi and generate pages like http://yi.wiktionary.org/wiki/%D7%B0%D7%90%D6%B8%D7%9A http://yi.wiktionary.org/wiki/%D7%98%D7%90%D6%B8%D7%92 and also taking care of all these !*%$$€@*# bugs. These pages look fine but the titles they link to should be invalid and the links should not show red. Best would be to let them with [[ and ]] brackets same as invalid links. Best regards Reinhardt [[user:gangleri]]
(In reply to comment #7) > sorry for this > and also taking care of all these !*%$$€@*# bugs. I fixed the involved links so the Whatlinkshere is no longer valid . Compare: http://yi.wiktionary.org/w/index.php?title=%D7%B0%D7%90%D6%B8%D7%9A&diff=4483&oldid=4477 http://yi.wiktionary.org/w/index.php?title=%D7%95%D7%95%D7%90%D6%B8%D7%9B%D7%A0%D7%98%D7%90%D6%B8%D7%92&diff=4482&oldid=4463 and bug 3894 white space characters, BiDi control characters should show up in diff
fixing this would require later a validation according to bug 3904 disallow user pages and user_talk pages starting with lower case on case sensitive wikis adding blocks bug 3904
Hi! The code on FiverAlpha is changing. See http://test.leuksman.com/view/Category:Mimic and bug 3888 comment 3 The category http://test.leuksman.com/view/Category:Mimic ilustrates that the punctuation characters can be used for fraud and vandalim. If you are not used to the punctuation topics you may *not* notice that http://test.leuksman.com/edit/User:Brion%E2%80%AD%E2%80%AC?oldid=9812 the edit of this *false account* contains punctuation characters in [[User:Brion|Brion]]. - one way to see these characters are verifying the URL; this is simple if most of the contained characters are 7-bit ASCII; - onother way to see these characters is inserting the cursor in the text and moving the cursor with the mouse trough the text area - another way to see these characters is to mark the text with the mouse Because these characters make more trouble then providing benefit I suggest to suppress the punctuation characters in titles until a solution could be provided which could be generaly accepted. As it is now mimic accounts can be created. This opens doors for fraud and vandalism. regard reinhardt [[user:gangleri]]
(In reply to comment #4) > more characters: > I found also http://www.fileformat.info/info/unicode/char/200e/index.htm Unicode Character 'LEFT-TO-RIGHT MARK' (U+200E) UTF-8 (hex) 0xE2 0x80 0x8E (e2808e) http://www.fileformat.info/info/unicode/char/200f/index.htm Unicode Character 'RIGHT-TO-LEFT MARK' (U+200F) UTF-8 (hex) 0xE2 0x80 0x8F (e2808f) source: http://www.fileformat.info/info/unicode/block/general_punctuation/list.htm
Hallo! I would like to CANCEL this request / draw it back. (There is no such MediaZilla resolution). The request is to restrictive to me and other methods to avoid the problem / to fix affected pages should be found. Such tools are requested at - Bug 4012: feature request: add a felexible magic character conversion to the build in editor which would allow to identify these characters in the editor - Bug 4185: feature request: provide a notification for irregular links which would avert users before submitting such links / such pages (either new or changed).
Closing as requested
as status is now this is more a DUPLICATE of bug 3696 Unicode Control Characters should be restricted in title text (RLM, LRM, RLO, LRO, . . .) *** This bug has been marked as a duplicate of bug 3696 ***