Last modified: 2012-11-06 08:53:54 UTC
See the url above. By inserting a U+0EFF (.) instead of a normal dot, the user managed to link to a blacklisted site traditio.ru. And even after I attempted to fix this by explicitly adding this char to blacklist[http://meta.wikimedia.org/w/index.php?diff=863226], it does not seem to work [http://meta.wikimedia.org/w/index.php?diff=863233].
Fixed in r30482
The fix seems a little narrow. What's the underlying reason that the exploit worked? U+0EFF can't be the only character that browsers will treat as a period in URLs.
(In reply to comment #2) > The fix seems a little narrow. What's the underlying reason that the exploit > worked? U+0EFF can't be the only character that browsers will treat as a > period in URLs. > I think we need some form of UTF normalization.
Indeed, there are far more ways: http://meta.wikimedia.org/w/index.php?oldid=1319535 Unicode normalisation again.
(In reply to comment #4) > Indeed, there are far more ways: > http://meta.wikimedia.org/w/index.php?oldid=1319535 > > Unicode normalisation again. > I'm not really sure what I'm supposed to be seeing at that oldid. That said, unicode normalization is really needed. We're doing so in some monitoring tools, but of course it's also needed in the blacklist as well.
better summary
"Unicode normalization" is a poor term to use for the problem involved here, since all the characters involved are already normalized by the definitions of the Unicode standard (they're NFC, to be precise). Adjusted summary.