Bug 1679 - Interwiki from UTF-8 to Latin-1 wiki is broken for U+0161
Description Mormegil 2005-03-10 14:37:08 UTC
There is an article called [[Beneš decrees]] on the English Wikipedia. Its URL is The Czech Wikipedia contains the
article called [[Benešovy dekrety]]; I have tried to insert an interwiki link to
en: using [[en:Beneš decrees]]. The link pointed to,
which is IMHO a proper UTF-8 encoding of the title, which should be decoded on
the en: side. But, it is not -- instead, the link goes to [[Bene]]!

Afterwards, I checked de:, which uses [[en:Bene decrees]], where a control
U+009A (SINGLE CHARACTER INTRODUCER) is literally (!) included; this gets encoded to, which is "correctly" decoded
to [[Bene%9A_decrees]]. So I tried [[en:Beneš decrees]] on cs:, which seems
to work
correctly for now. But -- I don't think it is a correct behavior.

A probable cause could be that IIANM the %9A character is not defined in proper
ISO 8859-1,
only in the windows-1252 enhancements, so that maybe the article on en: should
not have
that name at all. But, in that case, MediaWiki should probably guard against that.
Comment 1 peter green 2005-04-13 00:41:43 UTC
as you say that page has a char in its title that is not in iso-8859-1 but is in
windows-1252 and most browsers treat iso-8859-1 as windows-1252

however mediawiki can't handle your inbound interwiki because it can't convert
U+161 to iso-8859-1 

there are three possible fixes to this

1: convert those incoming interwikis to windows-1252
2: eliminate windows-1252 chars from en (they shouldn't really be there anyway
especially not in article titles)
3: convert en to unicode taking account of the windows-1252 chars

I suspect the reason there was a literal control code in de was a conversion
from iso-8859-1 to utf-8 that did not take account of the possibility that
windows-1252 chars may be present.
Comment 2 Zigger 2005-06-21 13:12:12 UTC
*** Bug 2472 has been marked as a duplicate of this bug. ***
Comment 3 Brion Vibber 2005-06-21 21:10:26 UTC
Non-ISO-8859-1 character in title, of course it doesn't work.

Non-issue with 1.5 and utf-8 conversion.
Comment 4 peter green 2005-06-21 21:30:53 UTC
1.5 is not in use yet. the real issue is that people were allowed to create
articles on iso-8859-1 wikis with titles using chars that iso-8859-1 allocates
to reserved control codes in the first place. and that browsers interpret
iso-8859-1 as windows-1252.

also you say moving to utf-8 is a soloution but see bug 1881 for why this is not
really the case!

Note You need to log in before you can comment on or make changes to this bug.