Last modified: 2005-11-20 13:49:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2821, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 821 - Illegal/unusual UTF-8 characters can make 2 pages appear to have same name
Illegal/unusual UTF-8 characters can make 2 pages appear to have same name
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
Macintosh Mac OS X 10.3
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2004-11-03 13:07 UTC by Sébastien Thébault
Modified: 2005-11-20 13:49 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Sébastien Thébault 2004-11-03 13:07:18 UTC
There is a fearly bug : two pages can have the same name, one with some text, the other whithout text ; see this link on french 
Wikipedia to see :édia:Le_Bistro#Disparition sans trace enregistrée à nouveau (semble-t-il).
Comment 1 Rowan Collins [IMSoP] 2004-11-03 16:31:32 UTC
My French is not brilliant, so I'm not quite sure what the behaviour is that is
being observed. But as far as I can make out the "two articles with the same
name" are "Trois Royaumes de Cor�ée" and "Trois Royaumes de Corée", where the
first contains an extra character between the r and the e-acute. In
URL-encoding, it is %EF%BF%BD [note that at some point it has become corrupted
in the example into a "?", I dug the original out of the page history]. 

So obviously these *aren't* pages with identical names, but before we can
consider the case closed, somebody needs to work out:
* what is the character %EF%BF%BD supposed to be?
* how did it get there? (what did the user *think* they were inputting?)
* why did the user affected think the two links were identical? (they seem to be
saying the extra character was invisible in their browser - why should that be?
perhaps it's some kind of combinational, zero-width, character?)
Comment 2 Brion Vibber 2004-11-03 18:48:20 UTC
The UTF-8 sequence EF BF BD represents character FFFD REPLACEMENT CHARACTER. This is used by various software as a placeholder/
replacement for illegal/corrupt characters; typically it's displayed as a question mark, or a black diamond with a white question mark in it, but 
sometimes is blank. (Depending on the font, the software, etc.)

One way it might show up in a wiki page is by editing with a browser that doesn't do UTF-8 correctly.

However the link I see listed presently has a _literal_ question mark (3F). It's possible it's been replaced by some browser during subsequent 
editing of the page.
Comment 3 Rowan Collins [IMSoP] 2004-11-03 19:22:57 UTC
[The following is additional info from the reporter, received by e-mail; thanks
for responding - for future reference, you need to add additional comments using
the web interface rather than replying to e-mails]

hello : you see right : it is a bug with Safari and Opera : with them,
i can't see the extra character, in the title of the page, in the URL
and in the text (in page text).
*I don't know what is this character.
* I think I put text in URL bar (Trois Royaumes de Corée), or it is a
bug from Safari (sorry for french : URL en cache mal mémorisé)
* When I paste them on the page, my
two main browser (Opera and Safari, on Mac OS X) display the same two
title to me. But Explorer is good. It was the same thing with the two
URL, and the two ''articles'' : one empty, one not.

And sorry, i didn't understand the last question

Comment 4 Rowan Collins [IMSoP] 2005-04-24 16:39:49 UTC
[updating summary: the 2 pages *don't* have the same name, and we now know why
they seemed to]

Meanwhile, can anyone think of a way of verifying that this bug is still
present, and/or know of any any code changes that should have fixed it?
Comment 5 Sébastien Thébault 2005-04-24 17:01:20 UTC
I don't see this bug since november.
Comment 6 Sébastien Thébault 2005-04-24 17:02:14 UTC
i don't see ever more this bug from november
Comment 7 SJ 2005-05-13 05:37:11 UTC
Seems to be fine since Nov 2004... if it shows up again, this can be reopened.
Comment 8 lɛʁi לערי ריינהארט 2005-11-20 13:49:26 UTC
links with

including "%EF%BF%BD" are generating "Bad titles"

This should be OK for all.

regards reinhardt [[user:gangleri]]

P.S. Is this a solved issue for
bug 3985: character conversion (tracking) ?

Note You need to log in before you can comment on or make changes to this bug.