Last modified: 2011-05-03 20:25:40 UTC
Example: http://ru.wikipedia.org/w/index.php?diff=30178032 http://пример.испытание was blacklisted, but I can add this url http://ru.wikipedia.org/w/index.php?diff=30178038
Presumably the SpamBlacklist extension needs to be modified to use the u flag for the regexes it makes so it interprets them as UTF-8. As a temporary work around, you can escape unicode characters using \xHH (replace HH with hex codes). For example: \bмакросъемка\.рф becomes \b\xD0\xBC\xD0\xB0\xD0\xBA\xD1\x80\xD0\xBE\xD1\x81\xD1\x8A\xD0\xB5\xD0\xBC\xD0\xBA\xD0\xB0\.\xD1\x80\xD1\x84 \bпример\.испытание becomes \b\xD0\xBF\xD1\x80\xD0\xB8\xD0\xBC\xD0\xB5\xD1\x80\.\xD0\xB8\xD1\x81\xD0\xBF\xD1\x8B\xD1\x82\xD0\xB0\xD0\xBD\xD0\xB8\xD0\xB5
at first look this work around does not work, http://ru.wikipedia.org/w/index.php?diff=30229518 http://ru.wikipedia.org/w/index.php?diff=30229527 Now I use AbuseFilter http://ru.wikipedia.org/wiki/Special:AbuseFilter/117 to block such links, but this approach has some drawbacks.
Sorry, the work around should not have the \b in it (presumably because things like \xD0 aren't word characters in non-utf8). \bмакросъемка\.рф becomes \xD0\xBC\xD0\xB0\xD0\xBA\xD1\x80\xD0\xBE\xD1\x81\xD1\x8A\xD0\xB5\xD0\xBC\xD0\xBA\xD0\xB0\.\xD1\x80\xD1\x84 \bпример\.испытание becomes \xD0\xBF\xD1\x80\xD0\xB8\xD0\xBC\xD0\xB5\xD1\x80\.\xD0\xB8\xD1\x81\xD0\xBF\xD1\x8B\xD1\x82\xD0\xB0\xD0\xBD\xD0\xB8\xD0\xB5 ----- Would someone who knows about such things be able to comment if adding the /u flag to the generated regexes would have any adverse performance affects?
This work around works fine, thanks! Alex
I haven't tried profiling, but tossing a /u on in SpamRegexBatch::buildRegexes() doesn't seem to break at least. It should however be double-checked with the full-size blacklists. However -- this isn't necessarily sufficient for handling IDN domain spam, as it won't match the punycode form of the name if it's linked that way. May require some normalization to really do this right.
Created attachment 8465 [details] Suggested patch Could you verify that the attached patch is where you think the /u should go to fix this?
r87352