Last modified: 2010-05-15 15:32:52 UTC
There is an article called [[Beneš decrees]] on the English Wikipedia. Its URL is http://en.wikipedia.org/wiki/Bene%9A_decrees. The Czech Wikipedia contains the article called [[Benešovy dekrety]]; I have tried to insert an interwiki link to en: using [[en:Beneš decrees]]. The link pointed to http://en.wikipedia.org/wiki/Bene%C5%A1_decrees, which is IMHO a proper UTF-8 encoding of the title, which should be decoded on the en: side. But, it is not -- instead, the link goes to [[Bene]]! Afterwards, I checked de:, which uses [[en:Bene decrees]], where a control character U+009A (SINGLE CHARACTER INTRODUCER) is literally (!) included; this gets encoded to http://en.wikipedia.org/wiki/Bene%C2%9A_decrees, which is "correctly" decoded to [[Bene%9A_decrees]]. So I tried [[en:Beneš decrees]] on cs:, which seems to work correctly for now. But -- I don't think it is a correct behavior. A probable cause could be that IIANM the %9A character is not defined in proper ISO 8859-1, only in the windows-1252 enhancements, so that maybe the article on en: should not have that name at all. But, in that case, MediaWiki should probably guard against that.
as you say that page has a char in its title that is not in iso-8859-1 but is in windows-1252 and most browsers treat iso-8859-1 as windows-1252 however mediawiki can't handle your inbound interwiki because it can't convert U+161 to iso-8859-1 there are three possible fixes to this 1: convert those incoming interwikis to windows-1252 2: eliminate windows-1252 chars from en (they shouldn't really be there anyway especially not in article titles) 3: convert en to unicode taking account of the windows-1252 chars I suspect the reason there was a literal control code in de was a conversion from iso-8859-1 to utf-8 that did not take account of the possibility that windows-1252 chars may be present.
*** Bug 2472 has been marked as a duplicate of this bug. ***
Non-ISO-8859-1 character in title, of course it doesn't work. Non-issue with 1.5 and utf-8 conversion.
1.5 is not in use yet. the real issue is that people were allowed to create articles on iso-8859-1 wikis with titles using chars that iso-8859-1 allocates to reserved control codes in the first place. and that browsers interpret iso-8859-1 as windows-1252. also you say moving to utf-8 is a soloution but see bug 1881 for why this is not really the case!