Last modified: 2005-11-20 13:49:26 UTC
There is a fearly bug : two pages can have the same name, one with some text, the other whithout text ; see this link on french
Wikipedia to see : http://fr.wikipedia.org/wiki/Wikipédia:Le_Bistro#Disparition sans trace enregistrée à nouveau (semble-t-il).
My French is not brilliant, so I'm not quite sure what the behaviour is that is
being observed. But as far as I can make out the "two articles with the same
name" are "Trois Royaumes de Cor�ée" and "Trois Royaumes de Corée", where the
first contains an extra character between the r and the e-acute. In
URL-encoding, it is %EF%BF%BD [note that at some point it has become corrupted
in the example into a "?", I dug the original out of the page history].
So obviously these *aren't* pages with identical names, but before we can
consider the case closed, somebody needs to work out:
* what is the character %EF%BF%BD supposed to be?
* how did it get there? (what did the user *think* they were inputting?)
* why did the user affected think the two links were identical? (they seem to be
saying the extra character was invisible in their browser - why should that be?
perhaps it's some kind of combinational, zero-width, character?)
The UTF-8 sequence EF BF BD represents character FFFD REPLACEMENT CHARACTER. This is used by various software as a placeholder/
replacement for illegal/corrupt characters; typically it's displayed as a question mark, or a black diamond with a white question mark in it, but
sometimes is blank. (Depending on the font, the software, etc.)
One way it might show up in a wiki page is by editing with a browser that doesn't do UTF-8 correctly.
However the link I see listed presently has a _literal_ question mark (3F). It's possible it's been replaced by some browser during subsequent
editing of the page.
[The following is additional info from the reporter, received by e-mail; thanks
for responding - for future reference, you need to add additional comments using
the web interface rather than replying to e-mails]
hello : you see right : it is a bug with Safari and Opera : with them,
i can't see the extra character, in the title of the page, in the URL
and in the text (in page text).
*I don't know what is this character.
* I think I put text in URL bar (Trois Royaumes de Corée), or it is a
bug from Safari (sorry for french : URL en cache mal mémorisé)
* When I paste them on the page http://fr.wikipedia.org/wiki/Bistro, my
two main browser (Opera and Safari, on Mac OS X) display the same two
title to me. But Explorer is good. It was the same thing with the two
URL, and the two ''articles'' : one empty, one not.
And sorry, i didn't understand the last question
[updating summary: the 2 pages *don't* have the same name, and we now know why
they seemed to]
Meanwhile, can anyone think of a way of verifying that this bug is still
present, and/or know of any any code changes that should have fixed it?
I don't see this bug since november.
i don't see ever more this bug from november
Seems to be fine since Nov 2004... if it shows up again, this can be reopened.
Unicode Character REPLACEMENT CHARACTER - U FFFD
including "%EF%BF%BD" are generating "Bad titles"
This should be OK for all.
regards reinhardt [[user:gangleri]]
P.S. Is this a solved issue for
bug 3985: character conversion (tracking) ?