Last modified: 2013-01-31 23:01:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38129, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36129 - AbuseFilter extensions do not support Unicode in regular expressions
AbuseFilter extensions do not support Unicode in regular expressions
Status: RESOLVED DUPLICATE of bug 22761
Product: MediaWiki extensions
Classification: Unclassified
AbuseFilter (Other open bugs)
unspecified
All All
: High normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-20 17:33 UTC by DavidL
Modified: 2013-01-31 23:01 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description DavidL 2012-04-20 17:33:20 UTC
The AbuseFilter extension documentation at https://www.mediawiki.org/wiki/Extension:AbuseFilter/RulesFormat for the rlike/regex keywords claims to use PCRE, but Unicode support is not implemented : PCRE syntax is not fully supported like the (*UCP) option.

The following example returns true because accentued characters and all non-7-bit-ASCII characters are considered as word-break/whitespace and matched by \b :
  "testé" regex "\btest\b"
It should return false instead of true.

Unicode should be supported as AbuseFilter is not only used on English projects, either by enabling option (*UCP) or by activating Unicode support by default.
Comment 1 Siddhartha Ghai 2012-07-27 10:48:46 UTC
Bug 17830 might be relevant for \b usage.
Comment 2 DavidL 2012-07-28 09:33:52 UTC
Bug 17830 is about doubling \ to get \b working, whereas in this case \b only work for ASCII 7 bits characters only, (all extended Unicode characters are considered as word break).
Comment 3 Mark Nelson 2013-01-31 23:01:04 UTC

*** This bug has been marked as a duplicate of bug 22761 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links