Last modified: 2010-05-15 15:32:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 1679 - Interwiki from UTF-8 to Latin-1 wiki is broken for U+0161
Interwiki from UTF-8 to Latin-1 wiki is broken for U+0161
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.4.x
All All
: Normal normal with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/Bene%C5%...
:
: 2472 (view as bug list)
Depends on: 1881
Blocks: 3985
  Show dependency treegraph
 
Reported: 2005-03-10 14:37 UTC by Mormegil
Modified: 2010-05-15 15:32 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Mormegil 2005-03-10 14:37:08 UTC
There is an article called [[Beneš decrees]] on the English Wikipedia. Its URL is
http://en.wikipedia.org/wiki/Bene%9A_decrees. The Czech Wikipedia contains the
article called [[Benešovy dekrety]]; I have tried to insert an interwiki link to
en: using [[en:Beneš decrees]]. The link pointed to
http://en.wikipedia.org/wiki/Bene%C5%A1_decrees,
which is IMHO a proper UTF-8 encoding of the title, which should be decoded on
the en: side. But, it is not -- instead, the link goes to [[Bene]]!

Afterwards, I checked de:, which uses [[en:Bene decrees]], where a control
character
U+009A (SINGLE CHARACTER INTRODUCER) is literally (!) included; this gets encoded to
http://en.wikipedia.org/wiki/Bene%C2%9A_decrees, which is "correctly" decoded
to [[Bene%9A_decrees]]. So I tried [[en:Beneš decrees]] on cs:, which seems
to work
correctly for now. But -- I don't think it is a correct behavior.

A probable cause could be that IIANM the %9A character is not defined in proper
ISO 8859-1,
only in the windows-1252 enhancements, so that maybe the article on en: should
not have
that name at all. But, in that case, MediaWiki should probably guard against that.
Comment 1 peter green 2005-04-13 00:41:43 UTC
as you say that page has a char in its title that is not in iso-8859-1 but is in
windows-1252 and most browsers treat iso-8859-1 as windows-1252

however mediawiki can't handle your inbound interwiki because it can't convert
U+161 to iso-8859-1 

there are three possible fixes to this

1: convert those incoming interwikis to windows-1252
2: eliminate windows-1252 chars from en (they shouldn't really be there anyway
especially not in article titles)
3: convert en to unicode taking account of the windows-1252 chars

I suspect the reason there was a literal control code in de was a conversion
from iso-8859-1 to utf-8 that did not take account of the possibility that
windows-1252 chars may be present.
Comment 2 Zigger 2005-06-21 13:12:12 UTC
*** Bug 2472 has been marked as a duplicate of this bug. ***
Comment 3 Brion Vibber 2005-06-21 21:10:26 UTC
Non-ISO-8859-1 character in title, of course it doesn't work.

Non-issue with 1.5 and utf-8 conversion.
Comment 4 peter green 2005-06-21 21:30:53 UTC
1.5 is not in use yet. the real issue is that people were allowed to create
articles on iso-8859-1 wikis with titles using chars that iso-8859-1 allocates
to reserved control codes in the first place. and that browsers interpret
iso-8859-1 as windows-1252.

also you say moving to utf-8 is a soloution but see bug 1881 for why this is not
really the case!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links