Last modified: 2013-01-31 23:01:04 UTC
The AbuseFilter extension documentation at https://www.mediawiki.org/wiki/Extension:AbuseFilter/RulesFormat for the rlike/regex keywords claims to use PCRE, but Unicode support is not implemented : PCRE syntax is not fully supported like the (*UCP) option. The following example returns true because accentued characters and all non-7-bit-ASCII characters are considered as word-break/whitespace and matched by \b : "testé" regex "\btest\b" It should return false instead of true. Unicode should be supported as AbuseFilter is not only used on English projects, either by enabling option (*UCP) or by activating Unicode support by default.
Bug 17830 might be relevant for \b usage.
Bug 17830 is about doubling \ to get \b working, whereas in this case \b only work for ASCII 7 bits characters only, (all extended Unicode characters are considered as word break).
*** This bug has been marked as a duplicate of bug 22761 ***