Last modified: 2010-05-15 14:36:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2168, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 168 - Strange behaviour for a specific unicode character
Strange behaviour for a specific unicode character
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All Linux
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: parser, utf8
Depends on:
Blocks: unicode
  Show dependency treegraph
Reported: 2004-08-18 19:37 UTC by dg
Modified: 2010-05-15 14:36 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description dg 2004-08-18 19:37:38 UTC
If you go to Hymn_of_the_Russian_Federation page, and look for "Other languages"
-> Russian (it looks like Pycckuu), if you hover over the link (or click it, for
that matter), you'll see two "broken" characters for each russian one,
presumably because Unicode is not translated properly. Notice now that if you
edit the page, go to [[ru]] link and get rid of or change the #1056 character
(first after the space) it now renders properly and when you click on it goes to
the right place in the Russian wiki (except, of course it's wrong, since you've
changed one character). The character in question is the russian uppercase R,
which looks like english P. It is used in the text of the article itself many
times and is rendered properly.

I am using Mozilla 1.6 on Linux, but I also get the same effect in Safari 1.2.3
on MacOS X.
Comment 1 Brion Vibber 2004-08-18 21:43:34 UTC
I don't see anything out of the ordinary in Safari 1.2.3. Could you provide some screen shots of the bug in action, and reference the exact revisions of 
the page that do and don't work?
Comment 2 Timwi 2004-08-19 10:24:10 UTC
I can confirm that I see this bug exactly as described.  Something weird is
happening there.

The original inter-wiki link was:
[[ru:Гимн России]]

It produced this link:

Notice that, in the middle, there is "_%D0_". This should instead be "_%D0%A0",
because the Cyrillic capital letter Er is %D0%A0 in UTF-8.

This means the bug is caused by the "%A0", which is a nonbreaking space in
Latin-1 (but not in UTF-8), being replaced by a simple space (and hence, an
underscore).  Browsers (in my case, Firefox) then no longer recognise the link
as being in UTF-8, interpret it as Latin-1, and so it comes out jumbled. 
Similarly, when you actually follow the link, the wiki software will notice that
the link is not proper UTF-8, pretend it was Latin-1, convert it to UTF-8, and
forward the user to a page that obviously doesn't exist.
Comment 3 Timwi 2004-08-19 10:26:13 UTC
Brion, you might find my above comment interesting, so I am adding you to the CC
list. Please let me know if you do not want me to do this.
Comment 4 Timwi 2004-08-19 10:28:19 UTC
Replacing the inter-wiki link with
didn't help; the same bug occurs.
Comment 5 JeLuF 2004-08-19 10:29:52 UTC
There was a patch lately to convert NBSPs to _. Some vandal created accounts
with a nick with trailing NBSPs and articles with NBSPs in their name which were
hard to block/delete. This probably has broken these links.
Comment 6 Brion Vibber 2004-08-19 17:22:51 UTC
Tim's hacked it to avoid the non-breaking space check for interwiki links so it may be working now; please check. I'm not sure this is fully in place as 
it's changed only in REL1_3, so I'm not marking as FIXED yet.

Timwi, don't bother CC'ing me, as I get and read *all* bugmail via wikibugs-l. :)
Comment 7 Brion Vibber 2004-10-12 18:16:26 UTC
Has this been fixed in CVS or is it still there?
Comment 8 Wil Mahan 2004-10-13 03:17:11 UTC
(In reply to comment #7)
> Has this been fixed in CVS or is it still there?

It appears to be fixed. The example given in comment #2 gives the correct
output, as I understand it; see

Note You need to log in before you can comment on or make changes to this bug.