Last modified: 2010-05-15 15:32:52 UTC
There is an article called [[Beneš decrees]] on the English Wikipedia. Its URL is
http://en.wikipedia.org/wiki/Bene%9A_decrees. The Czech Wikipedia contains the
article called [[Benešovy dekrety]]; I have tried to insert an interwiki link to
en: using [[en:Beneš decrees]]. The link pointed to
which is IMHO a proper UTF-8 encoding of the title, which should be decoded on
the en: side. But, it is not -- instead, the link goes to [[Bene]]!
Afterwards, I checked de:, which uses [[en:Bene decrees]], where a control
U+009A (SINGLE CHARACTER INTRODUCER) is literally (!) included; this gets encoded to
http://en.wikipedia.org/wiki/Bene%C2%9A_decrees, which is "correctly" decoded
to [[Bene%9A_decrees]]. So I tried [[en:Beneš decrees]] on cs:, which seems
correctly for now. But -- I don't think it is a correct behavior.
A probable cause could be that IIANM the %9A character is not defined in proper
only in the windows-1252 enhancements, so that maybe the article on en: should
that name at all. But, in that case, MediaWiki should probably guard against that.
as you say that page has a char in its title that is not in iso-8859-1 but is in
windows-1252 and most browsers treat iso-8859-1 as windows-1252
however mediawiki can't handle your inbound interwiki because it can't convert
U+161 to iso-8859-1
there are three possible fixes to this
1: convert those incoming interwikis to windows-1252
2: eliminate windows-1252 chars from en (they shouldn't really be there anyway
especially not in article titles)
3: convert en to unicode taking account of the windows-1252 chars
I suspect the reason there was a literal control code in de was a conversion
from iso-8859-1 to utf-8 that did not take account of the possibility that
windows-1252 chars may be present.
*** Bug 2472 has been marked as a duplicate of this bug. ***
Non-ISO-8859-1 character in title, of course it doesn't work.
Non-issue with 1.5 and utf-8 conversion.
1.5 is not in use yet. the real issue is that people were allowed to create
articles on iso-8859-1 wikis with titles using chars that iso-8859-1 allocates
to reserved control codes in the first place. and that browsers interpret
iso-8859-1 as windows-1252.
also you say moving to utf-8 is a soloution but see bug 1881 for why this is not
really the case!