Last modified: 2014-07-26 16:45:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T66541, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 64541 - regex expressions starting with caret (^) not functioning as per instructions say
regex expressions starting with caret (^) not functioning as per instructions...
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Spam Blacklist (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: SWMT
  Show dependency treegraph
 
Reported: 2014-04-28 10:49 UTC by billinghurst
Modified: 2014-07-26 16:45 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description billinghurst 2014-04-28 10:49:11 UTC
I am not sure whether this is a case of the instructions being wrong, or something is broken, or that it doesn't work on meta's implementation of a global blacklist.

Instructions at [[mw:Extension:SpamBlacklist#Blacklist syntax]] state that a caret can be used to match the start of a blacklist regex

<quote>
"The '^' and '$' anchors match the beginning and end of the domain name, not the beginning and end of the URL."
</quote>

Recently this methodology was used at meta[1] in an attempt to lessen the collateral damage of a block on t.co" urls (which are routinely used to spam). [[m:User:COIBot/XWiki/t.co]]. However, it has been found that the "^t\.co\b" as tehregex has ot been effective and t.co domains have been able to be added.

It would be useful if we could work out which of the three scenarios that we have so the proper fix can be requested.  Thanks.




[1]https://meta.wikimedia.org/w/index.php?title=Spam_blacklist&diff=7630728&oldid=7593377&diffonly=yes
Comment 1 Liangent 2014-05-28 14:08:30 UTC
I guess it's difficult to fix it now because regexes are now merged (piped) first; people would modify instructions to "fix" it now...

Maybe you can use (?<!-)\bt\.co\b?
Comment 2 billinghurst 2014-07-23 13:08:43 UTC
The regex suggested while functional for a regex is unsuccessful in preventing addition of the link. [Tested by two people on two different wikis with linked being added]. So back to square one.

So we are back to the situation that the spam blacklist is not completely functional for regex to prevent addition of just t.co

^t\.co\b           FAIL
(?<!-)\bt\.co\b    FAIL
Comment 3 Liangent 2014-07-25 18:07:32 UTC
(In reply to billinghurst from comment #2)
> The regex suggested while functional for a regex is unsuccessful in
> preventing addition of the link. [Tested by two people on two different
> wikis with linked being added]. So back to square one.
> 
> So we are back to the situation that the spam blacklist is not completely
> functional for regex to prevent addition of just t.co
> 
> ^t\.co\b           FAIL
> (?<!-)\bt\.co\b    FAIL

In my test, [[MediaWiki:Spam-blacklist]]:

 #<pre>
google
(?<!-)\bt\.co\b
baidu
 #</pre>

Link:

http://t.co/abc

and it says:

The following text is what triggered our spam filter: t.co

... so it works for me?
Comment 4 Glaisher 2014-07-26 16:45:11 UTC
Yepp, it works. It didn't last time probably because the list was not updated when it was tested. The documentation at mww is wrong.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links