Last modified: 2010-04-16 14:57:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T4042, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 2042 - UTF8 homoglyph in titles
UTF8 homoglyph in titles
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal trivial with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-01 16:47 UTC by tsor
Modified: 2010-04-16 14:57 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description tsor 2005-05-01 16:47:01 UTC
In the last time vandals create many articles which seems to have exactly the
same title. Unfortunately they insert unvisible UTF8 characters into the title.
So we get many diffferent articles which titles looks like say "Karin Stoiber".

Software should refuse creating articles with a title which includes an
unvisible character - of course "blank" must be an exception.

Vandal problem occured in de-WP. My nickname is tsor, I am administrator.

tsor
Comment 1 FoeNyx 2005-05-01 16:52:10 UTC

*** This bug has been marked as a duplicate of 1414 ***
Comment 2 FoeNyx 2005-05-02 10:53:48 UTC
When you said "unvisible UTF8 characters" I thought you were talking about some
whitespace utf8 characters, but (as you explained in bug 1414) you are talking
about characters which look like some latin characters (eg 'greek kappa' like
'K', or 'cyrillic small dze' like 's', etc .. -> [[w:en:Homoglyph]] ). 

Well using non latin utf8 characters in titles is not a bug .. it's a feature. 

Some wiki, like fr: use a lot of non latin char in the titles (usually it
redirects to a romanized normalised title). Moreover the homoglyph problem
already existed with l (L) and I (i) loot at [[w:de:Ill (Elsass)]] ; some
vandals can create a page "Johannes Paul ll" (Johannes Paul II) most users wont
notice. 

As it's somewhat related to punycode/IDN firefox 1.0.1 problem look at mozilla
discussions :
- http://weblogs.mozillazine.org/gerv/archives/007562.html
- http://www.gerv.net/security/phishing-browser-defences.html
We could try the suggested : 
- "Measurements of lexical proximity" with an older article title (helped with a
list of utf8 homograph pair) 
- "Domain letter colouring", hilighting, tooltips above chars showing which
unicode bloc they belong to. Or we could hilight/warn only unusual utf8
characters but this could required to define the list of frequently used char
per wiki.

I change the summary of the bug to "utf8 Homoglyph in titles"
Comment 3 Ævar Arnfjörð Bjarmason 2005-05-03 09:35:13 UTC
Moved to the general/Unknown component and changed the severity from major to
trivial, there is an easy workaround avalible.
Comment 4 Chad H. 2010-04-16 14:57:17 UTC
(In reply to comment #2)
> Well using non latin utf8 characters in titles is not a bug .. it's a feature. 
> 

Yes, and on those grounds I would originally suggest a WONTFIX.

(In reply to comment #3)
> Moved to the general/Unknown component and changed the severity from major to
> trivial, there is an easy workaround avalible.

Yes, you can use AbuseFilter to prevent these sorts of things if vandalism is indeed an issue for your wiki (and I believe en.wikipedia already does some things to this effect). For that reason, I'm going to resolve this FIXED.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links