Last modified: 2010-05-15 15:32:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3679, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 1679 - Interwiki from UTF-8 to Latin-1 wiki is broken for U+0161
Interwiki from UTF-8 to Latin-1 wiki is broken for U+0161
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
All All
: Normal normal with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: 2472 (view as bug list)
Depends on: 1881
Blocks: 3985
  Show dependency treegraph
Reported: 2005-03-10 14:37 UTC by Mormegil
Modified: 2010-05-15 15:32 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Mormegil 2005-03-10 14:37:08 UTC
There is an article called [[Beneš decrees]] on the English Wikipedia. Its URL is The Czech Wikipedia contains the
article called [[Benešovy dekrety]]; I have tried to insert an interwiki link to
en: using [[en:Beneš decrees]]. The link pointed to,
which is IMHO a proper UTF-8 encoding of the title, which should be decoded on
the en: side. But, it is not -- instead, the link goes to [[Bene]]!

Afterwards, I checked de:, which uses [[en:Bene decrees]], where a control
U+009A (SINGLE CHARACTER INTRODUCER) is literally (!) included; this gets encoded to, which is "correctly" decoded
to [[Bene%9A_decrees]]. So I tried [[en:Beneš decrees]] on cs:, which seems
to work
correctly for now. But -- I don't think it is a correct behavior.

A probable cause could be that IIANM the %9A character is not defined in proper
ISO 8859-1,
only in the windows-1252 enhancements, so that maybe the article on en: should
not have
that name at all. But, in that case, MediaWiki should probably guard against that.
Comment 1 peter green 2005-04-13 00:41:43 UTC
as you say that page has a char in its title that is not in iso-8859-1 but is in
windows-1252 and most browsers treat iso-8859-1 as windows-1252

however mediawiki can't handle your inbound interwiki because it can't convert
U+161 to iso-8859-1 

there are three possible fixes to this

1: convert those incoming interwikis to windows-1252
2: eliminate windows-1252 chars from en (they shouldn't really be there anyway
especially not in article titles)
3: convert en to unicode taking account of the windows-1252 chars

I suspect the reason there was a literal control code in de was a conversion
from iso-8859-1 to utf-8 that did not take account of the possibility that
windows-1252 chars may be present.
Comment 2 Zigger 2005-06-21 13:12:12 UTC
*** Bug 2472 has been marked as a duplicate of this bug. ***
Comment 3 Brion Vibber 2005-06-21 21:10:26 UTC
Non-ISO-8859-1 character in title, of course it doesn't work.

Non-issue with 1.5 and utf-8 conversion.
Comment 4 peter green 2005-06-21 21:30:53 UTC
1.5 is not in use yet. the real issue is that people were allowed to create
articles on iso-8859-1 wikis with titles using chars that iso-8859-1 allocates
to reserved control codes in the first place. and that browsers interpret
iso-8859-1 as windows-1252.

also you say moving to utf-8 is a soloution but see bug 1881 for why this is not
really the case!

Note You need to log in before you can comment on or make changes to this bug.