Last modified: 2010-04-16 14:57:17 UTC
In the last time vandals create many articles which seems to have exactly the same title. Unfortunately they insert unvisible UTF8 characters into the title. So we get many diffferent articles which titles looks like say "Karin Stoiber". Software should refuse creating articles with a title which includes an unvisible character - of course "blank" must be an exception. Vandal problem occured in de-WP. My nickname is tsor, I am administrator. tsor
*** This bug has been marked as a duplicate of 1414 ***
When you said "unvisible UTF8 characters" I thought you were talking about some whitespace utf8 characters, but (as you explained in bug 1414) you are talking about characters which look like some latin characters (eg 'greek kappa' like 'K', or 'cyrillic small dze' like 's', etc .. -> [[w:en:Homoglyph]] ). Well using non latin utf8 characters in titles is not a bug .. it's a feature. Some wiki, like fr: use a lot of non latin char in the titles (usually it redirects to a romanized normalised title). Moreover the homoglyph problem already existed with l (L) and I (i) loot at [[w:de:Ill (Elsass)]] ; some vandals can create a page "Johannes Paul ll" (Johannes Paul II) most users wont notice. As it's somewhat related to punycode/IDN firefox 1.0.1 problem look at mozilla discussions : - http://weblogs.mozillazine.org/gerv/archives/007562.html - http://www.gerv.net/security/phishing-browser-defences.html We could try the suggested : - "Measurements of lexical proximity" with an older article title (helped with a list of utf8 homograph pair) - "Domain letter colouring", hilighting, tooltips above chars showing which unicode bloc they belong to. Or we could hilight/warn only unusual utf8 characters but this could required to define the list of frequently used char per wiki. I change the summary of the bug to "utf8 Homoglyph in titles"
Moved to the general/Unknown component and changed the severity from major to trivial, there is an easy workaround avalible.
(In reply to comment #2) > Well using non latin utf8 characters in titles is not a bug .. it's a feature. > Yes, and on those grounds I would originally suggest a WONTFIX. (In reply to comment #3) > Moved to the general/Unknown component and changed the severity from major to > trivial, there is an easy workaround avalible. Yes, you can use AbuseFilter to prevent these sorts of things if vandalism is indeed an issue for your wiki (and I believe en.wikipedia already does some things to this effect). For that reason, I'm going to resolve this FIXED.