Last modified: 2012-11-06 08:53:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T14896, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 12896 - Spam Blacklist shouldn't be fooled by similar-looking Unicode characters
Spam Blacklist shouldn't be fooled by similar-looking Unicode characters
Status: REOPENED
Product: MediaWiki extensions
Classification: Unclassified
Spam Blacklist (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/w/index.php?t...
:
Depends on:
Blocks: SWMT
  Show dependency treegraph
 
Reported: 2008-02-03 18:36 UTC by Max Semenik
Modified: 2012-11-06 08:53 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Max Semenik 2008-02-03 18:36:29 UTC
See the url above. By inserting a U+0EFF (.) instead of a normal dot, the user managed to link to a blacklisted site traditio.ru. And even after I attempted to fix this by explicitly adding this char to blacklist[http://meta.wikimedia.org/w/index.php?diff=863226], it does not seem to work [http://meta.wikimedia.org/w/index.php?diff=863233].
Comment 1 Victor Vasiliev 2008-02-03 19:00:48 UTC
Fixed in r30482
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-02-03 22:03:59 UTC
The fix seems a little narrow.  What's the underlying reason that the exploit worked?  U+0EFF can't be the only character that browsers will treat as a period in URLs.
Comment 3 Victor Vasiliev 2008-04-24 14:01:08 UTC
(In reply to comment #2)
> The fix seems a little narrow.  What's the underlying reason that the exploit
> worked?  U+0EFF can't be the only character that browsers will treat as a
> period in URLs.
> 

I think we need some form of UTF normalization.
Comment 4 Max Semenik 2008-12-21 21:11:09 UTC
Indeed, there are far more ways: http://meta.wikimedia.org/w/index.php?oldid=1319535

Unicode normalisation again.
Comment 5 Mike.lifeguard 2009-02-17 22:00:49 UTC
(In reply to comment #4)
> Indeed, there are far more ways:
> http://meta.wikimedia.org/w/index.php?oldid=1319535
> 
> Unicode normalisation again.
> 

I'm not really sure what I'm supposed to be seeing at that oldid.

That said, unicode normalization is really needed. We're doing so in some monitoring tools, but of course it's also needed in the blacklist as well.
Comment 6 Mike.lifeguard 2009-02-17 22:01:13 UTC
better summary
Comment 7 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-02-17 23:20:48 UTC
"Unicode normalization" is a poor term to use for the problem involved here, since all the characters involved are already normalized by the definitions of the Unicode standard (they're NFC, to be precise).  Adjusted summary.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links