Last modified: 2005-11-20 13:49:26 UTC
There is a fearly bug : two pages can have the same name, one with some text, the other whithout text ; see this link on french Wikipedia to see : http://fr.wikipedia.org/wiki/Wikipédia:Le_Bistro#Disparition sans trace enregistrée à nouveau (semble-t-il).
My French is not brilliant, so I'm not quite sure what the behaviour is that is being observed. But as far as I can make out the "two articles with the same name" are "Trois Royaumes de Cor�ée" and "Trois Royaumes de Corée", where the first contains an extra character between the r and the e-acute. In URL-encoding, it is %EF%BF%BD [note that at some point it has become corrupted in the example into a "?", I dug the original out of the page history]. So obviously these *aren't* pages with identical names, but before we can consider the case closed, somebody needs to work out: * what is the character %EF%BF%BD supposed to be? * how did it get there? (what did the user *think* they were inputting?) * why did the user affected think the two links were identical? (they seem to be saying the extra character was invisible in their browser - why should that be? perhaps it's some kind of combinational, zero-width, character?)
The UTF-8 sequence EF BF BD represents character FFFD REPLACEMENT CHARACTER. This is used by various software as a placeholder/ replacement for illegal/corrupt characters; typically it's displayed as a question mark, or a black diamond with a white question mark in it, but sometimes is blank. (Depending on the font, the software, etc.) One way it might show up in a wiki page is by editing with a browser that doesn't do UTF-8 correctly. However the link I see listed presently has a _literal_ question mark (3F). It's possible it's been replaced by some browser during subsequent editing of the page.
[The following is additional info from the reporter, received by e-mail; thanks for responding - for future reference, you need to add additional comments using the web interface rather than replying to e-mails] hello : you see right : it is a bug with Safari and Opera : with them, i can't see the extra character, in the title of the page, in the URL and in the text (in page text). *I don't know what is this character. * I think I put text in URL bar (Trois Royaumes de Corée), or it is a bug from Safari (sorry for french : URL en cache mal mémorisé) * When I paste them on the page http://fr.wikipedia.org/wiki/Bistro, my two main browser (Opera and Safari, on Mac OS X) display the same two title to me. But Explorer is good. It was the same thing with the two URL, and the two ''articles'' : one empty, one not. And sorry, i didn't understand the last question Archeos
[updating summary: the 2 pages *don't* have the same name, and we now know why they seemed to] Meanwhile, can anyone think of a way of verifying that this bug is still present, and/or know of any any code changes that should have fixed it?
I don't see this bug since november.
i don't see ever more this bug from november
Seems to be fine since Nov 2004... if it shows up again, this can be reopened.
links with Unicode Character REPLACEMENT CHARACTER - U FFFD http://www.fileformat.info/info/unicode/char/fffd/index.htm including "%EF%BF%BD" are generating "Bad titles" http://yi.wiktionary.org/w/index.php?title=project:bugzilla/00821/%EF%BF%BD&action=edit This should be OK for all. regards reinhardt [[user:gangleri]] P.S. Is this a solved issue for bug 3985: character conversion (tracking) ?