Last modified: 2011-09-04 14:56:51 UTC
Hi, I found a problem on URL with some Devanagari characters on present (14.02.2005) Hindi Wiktionary project. This is tested with Konqueror and Mozilla and I think only present in 1.4. URLs with some Devanagari characters (at least ज़, ड़ and फ़) can't be resolved. Links appears in red although the article exists. Same while using Unicode numbers, respectively ज़ ड़ and फ़ for the 3 characters above. Examples : http://hi.wiktionary.org/wiki/शनिवार Article [[हफ़्ता]] exists, but is not accessible on http://hi.wiktionary.org/wiki/हफ़्ता Thanks a lot, Yann
URL: * http://hi.wiktionary.org/wiki/%E0%A4%B6%E0%A4%A8%E0%A4%BF%E0%A4%B5%E0%A4%BE%E0%A4%B0 (शनिवार) * http://hi.wiktionary.org/wiki/%E0%A4%B9%E0%A4%AB%E0%A4%BC%E0%A5%8D%E0%A4%A4%E0%A4%BE (हफ़्ता)
Another example: * http://hi.wiktionary.org/wiki/Template:fr * http://hi.wiktionary.org/wiki/%E0%A4%AB%E0%A4%BC%E0%A5%8D%E0%A4%B0%E0%A4%BE%E0%A4%81%E0%A4%B8%E0%A5%80%E0%A4%B8%E0%A5%80
This bug also appears with Firefox and IE on Windows, so it's independent of the browser.
Same bug as http://bugzilla.wikipedia.org/show_bug.cgi?id=1375
Here is a way to get out of it, thanks to Muke. Yann <MukeUTF-8> I have run into the same bug <MukeUTF-8> It is because of Unicode normalization <MukeUTF-8> the same happened with old articles using the Greek acute accent <MukeUTF-8> I think it is the same problem. i am looking into ti <MukeUTF-8> *it <yannf> MukeUTF-8, oh interesting <MukeUTF-8> the reason is because, say. <MukeUTF-8> the mediawiki software takes the "ज़" that you type in <MukeUTF-8> and it converts it into the "ज" plus the dot <MukeUTF-8> as two separate characters, because the Unicode standard defines them as identical. <MukeUTF-8> the problem is that your article with "ज़" in the title was created first... and it was never converted <yannf> yes, it appears on URLs with letters with a dot <yannf> what do you mean by "converted" ? <MukeUTF-8> I mean that it converts the one character "ज़" into the two characters "ज" and "़" <yannf> how can we solve this ? <MukeUTF-8> Someone has to go into the database and convert the old article titles. <yannf> there are also articles which are accessible, but the link remains red <MukeUTF-8> or at least convert whatever points to the articles <yannf> also with a dot in the URL <yannf> "at least convert whatever points to the articles" <- but the links seem to be ok <MukeUTF-8> I mean in the database <MukeUTF-8> I don't really know the details of how it could be fixed. <yannf> why it appears only in 1.4 ? <MukeUTF-8> Because Unicode normalization was implemented <MukeUTF-8> which means, for convenience of storage and searching and whatnot, characters that are defined as identical are stored in a canonical form, which may not be the same form as was typed in <MukeUTF-8> another example was the Greek characters I mentioned... where "ά" (greek alpha with old acute accent) was typed in before, it is now converted to "ά" (greek alpha with modern tonos) <MukeUTF-8> So old article titles with "ά" with the old accent can't be reached anymore, because it will always be turned into the letter with the modern accent by the software <yannf> what if i copy the articles by hand ? <MukeUTF-8> if you can get to the article <MukeUTF-8> New articles shouldn't have any trouble <MukeUTF-8> only ones from before the conversion <yannf> yes, but there are also articles which are accessible, but the link remains red <MukeUTF-8> that i'm not sure about <MukeUTF-8> oh wait <MukeUTF-8> When was the last time the page with the link was edited? <yannf> http://hi.wiktionary.org/w/index.php?title=Template:-fr-&action=history <yannf> Dec 30, 2004 <MukeUTF-8> because not only old article titles, but old article text was not converted. So if the link contains an "old" character, it will consider it a red link, even though the target page with the "new" character exists. But the conversion is in place now, so if you edit the page, it should convert it to a "new" character and work properly. Try it now, edit the page and hit "preview" <MukeUTF-8> (the page is not loading for me atm, or i would check this myself) <yannf> if i edit the page, the link becomes red on http://hi.wiktionary.org/wiki/Template:-fr- <MukeUTF-8> ah... <MukeUTF-8> that's because the page with the "old" character exists, but not the page with the "new" character <yannf> yes, i think i understood <MukeUTF-8> http://bugzilla.wikipedia.org/show_bug.cgi?id=1375 <yannf> on this page, the link was red, i edited, and it's now blue, http://hi.wiktionary.org/wiki/Template:kk <MukeUTF-8> *nod* <MukeUTF-8> the articles can be updated to the new characters by editing them... but the titles need to be edited by someone with access to the database, because we can't reach them from here <MukeUTF-8> I posted on the wiktionary mailing list for them to do it for the Greek words involved but it never happened :\ <yannf> well, there are only a handful of them, so i could even create them again, if it solves the pb <MukeUTF-8> but then, the things i ask for never seem to happen... <yannf> i have a dump of the old database <MukeUTF-8> true, you could make them again, though you lose the history <MukeUTF-8> and attributions <yannf> yes, i am the only editor on the indi wiktionary ;) <yannf> *hindi <MukeUTF-8> ah, well, then that is probably ok :x) <MukeUTF-8> i'm just about the only editor on the latin one, so I know how it is ;) <yannf> ;) <MukeUTF-8> there is like... one other regular user. but he only speaks Japanese, and only adds proper names... <MukeUTF-8> so I don't generally count him o-o <yannf> there will be a few lost articles in the database, that the only remaining pb <MukeUTF-8> hmm, i suppose i could pull those greek articles out of the old db dumps... <yannf> may i copy the log of this chat to the bug report ? <yannf> it would be others <yannf> it would help others <MukeUTF-8> ok <MukeUTF-8> I have to go to work now. ttyl. <yannf> ok thanks <MukeUTF-8> no problem :)
So I created again the inaccessible articles. Now the old ones need to be deleted: all articles with ड़ (ड़), ज़ (ज़) or फ़ (फ़) in the URL created before the conversion have to be deleted.
Hallo! please see - http://hi.wiktionary.org/w/index.php?diff=10894&oldid=5133 - http://hi.wiktionary.org/w/index.php?diff=10895&oldid=5009 This fixed the problem both for the section and the category and also [[wiktionary:hi:अंग्रेज़ी]]. (All links are blue now / some black at [[wiktionary:hi:अंग्रेज़ी]]). http://hi.wiktionary.org/w/index.php?title=%E0%A4%85%E0%A4%82%E0%A4%97%E0%A5%8D%E0%A4%B0%E0%A5%87%E0%A4%9C%E0%A4%BC%E0%A5%80&action=purge A duplicate of this is Bug 3860: links generated with precombined characters show red despite the fact that the normalised links exist best regards reinhardt [[user:gangleri]]
*** Bug 3860 has been marked as a duplicate of this bug. ***
making readjustments for component and dependencies There are some plans to make this easier in Bugzilla: Bug [Bugzilla] 102161 == Resolving as duplicate should display field differences Bug [Bugzilla] 319803 == feature request: when changing product, component etc. display old product, old component, other fields in all required steps Bugzilla [Bugzilla] 65382 == Let people know when deps exist as resolving duplicate. Bug 3860 depends on Bug 2399: Unicode normalization interferes with Hebrew and Arabic with vowels blocks Bug 3985: character conversion (tracking) "Component" will be changed to "Internationalization" in a next "edit".
*** Bug 1375 has been marked as a duplicate of this bug. ***
changing summary from problem on URL with Devanagari characters to *first* perform Unicode normalisation and check for existence of pages *after* the normalisation Hope that this would be easy to fix. Unicode normalisation should always be performed *first*. chnaging Severity from "normal" to "major". Bug 1375: Unicode normalization leaves red links mentions that special:Whatlinkshere might be afected as well. Please verify if this will be fixed as well. Hopefully there are no other places in the code where the Unicode normalisation is *not* performed first. best regards reinhardt [[user:gangleri]]
(In reply to comment #11) > Bug 1375: Unicode normalization leaves red links > mentions that special:Whatlinkshere might be afected as well. Please verify if > this will be fixed as well. http://la.wiktionary.org/wiki/Special:Whatlinkshere/%E1%BD%88%CE%BE%CF%8D%CF%82 does *not* show "[[wiktionary:la:Ὀξύς]]" *but* *every* [[Special:Whatlinkshere/foo]] shows [[foo]] in the list. This is easier to see at [[Special:Whatlinkshere/Tofu]]. Why this is *not* the case at [[wiktionary:la:Special:Whatlinkshere/Ὀξύς]]?
Removed bogus dependency.
(In reply to comment #6) > So I created again the inaccessible articles. Now the old ones need to be deleted: all articles with ड़ > (ड़), ज़ (ज़) or फ़ (फ़) in the URL created before the conversion have to be deleted. please read also the disussion from comment #5 Yann I understand that there was / there is also *another* problem related to page titles you can not access and which should be deleted. Please go to [[wiktionary:hi:special:Allpages]]. Tray to identify if you see titles which would not open or which would apear to be twice there. Please do both: a) make a screen dump and mark / some of the titles which have / create problems b) please provide the links c) please describe the problem from *your* point of view (what you expect, what you can, what does not work d) How many namespaces are affected? Thanks in advance! best regards reinhardt [[user:gangleri]]
(In reply to comment #12) > (In reply to comment #11) > > Bug 1375: Unicode normalization leaves red links > > mentions that special:Whatlinkshere might be afected as well. Please verify if > > this will be fixed as well. "special:Whatlinkshere might be afected as well" see also [[user:Gangleri/tests/bugzilla/03860]]
(In reply to comment #15) > "special:Whatlinkshere might be afected as well" see also > [[user:Gangleri/tests/bugzilla/03860]] indeed %EF%AC%AE -> %D7%90%D6%B7 example http://test.wikipedia.org/wiki/Special:Wantedpages lists http://test.wikipedia.org/w/index.php?title=User:Gangleri/tests/bugzilla/test/%EF%AC%AE&action=edit which should be http://test.wikipedia.org/wiki/User:Gangleri/tests/bugzilla/04917/%D7%90%D6%B7 and http://test.wikipedia.org/w/index.php?title=User:Gangleri/tests/bugzilla/04917/example_using_%22%EF%AC%AE%22&action=edit which should be http://test.wikipedia.org/wiki/User:Gangleri/tests/bugzilla/04917/example_using_%22%D7%90%D6%B7%22
As far as I can see the problem only affects very old titles, and I think your script that checks invalid titles should catch them.