Last modified: 2014-02-12 01:17:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 54441 - API for SpamBlacklist
API for SpamBlacklist
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Spam Blacklist (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Jackmcbarn
: easy
Depends on:
Blocks: noncoreapi
  Show dependency treegraph
 
Reported: 2013-09-21 21:26 UTC by Kunal Mehta (Legoktm)
Modified: 2014-02-12 01:17 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-09-21 21:26:50 UTC
Similar to action=titleblacklist (https://en.wikipedia.org/w/api.php?action=titleblacklist&tbtitle=Foo), a simple API to check if the provided url is blacklisted.
Comment 1 Gerrit Notification Bot 2013-09-21 23:50:39 UTC
Change 85512 had a related patch set uploaded by Jackmcbarn:
Add an API action to test blacklisted URLs

https://gerrit.wikimedia.org/r/85512
Comment 2 Betacommand 2013-11-18 23:34:33 UTC
Does this cross reference the whitelist?
Comment 3 Kunal Mehta (Legoktm) 2013-11-18 23:37:02 UTC
(In reply to comment #2)
> Does this cross reference the whitelist?

The current patch does.
Comment 4 Tim Starling 2013-12-06 00:43:04 UTC
Why do you want this?
Comment 5 Betacommand 2013-12-06 01:40:26 UTC
Betacommand	TimStarling: to what are you referring to in your last comment on 54441
Betacommand	* bug 54441
Elsie	Betacommand: There's no stated use-case.
Betacommand	I can easily see several
Betacommand	reviewing links on a wiki
Betacommand	seeing which are and which are not blacklisted
Betacommand	and if a domain is blacklisted is a particular link whitelisted
Betacommand	and finding what blacklist rules are hitting a given link, (IE finding what caused a link to get caught by the blacklist)
Betacommand	I have seen several cases where either an error or minor oversight in a blacklisting caused collateral, and finding the correct rule can be an issue
Comment 6 Tim Starling 2013-12-06 02:30:40 UTC
(In reply to comment #5)
> Betacommand    reviewing links on a wiki
> Betacommand    seeing which are and which are not blacklisted

That application is not efficiently supported by the proposed patch. You need a batch lookup, you don't want to be doing hundreds of API queries on a page with hundreds of links.
Comment 7 MZMcBride 2013-12-09 04:55:54 UTC
Could adding a programmatic way to check for blacklisted URLs lead to smarter spam? I think there's a concern that adding this functionality, in batch form or otherwise, would make a wiki more susceptible to abuse. Thoughts?
Comment 8 Kunal Mehta (Legoktm) 2013-12-09 05:32:11 UTC
(In reply to comment #7)
> Could adding a programmatic way to check for blacklisted URLs lead to smarter
> spam? I think there's a concern that adding this functionality, in batch form
> or otherwise, would make a wiki more susceptible to abuse. Thoughts?

Anyone can currently download https://meta.wikimedia.org/wiki/Spam_blacklist and parse it locally which would be much faster than trying to use an API to check if a link is blacklisted.

The only concern I would have is if a a file blacklist is being used, there's a good chance it isn't public. Might be worth automatically disabling the API module if any file blacklist is being used.
Comment 9 Dirk Beetstra 2013-12-09 06:23:37 UTC
@MZMcBride - Spammers are inherently smart (they make money with it, that is their drive).  We've seen many tricks to try and get around the blacklist.  That generally has 2 effects: first, we blacklist the evading stuff without even considering to warn, and indef any accounts involved without discussion, and secondly, delisting of any of the domains will be denied forever - if you really want to continue your abuse to that level and show that much persistence, it is just plain game over.  And anyway, this is possible already by, as others suggest as a solution, to just download the lists manually and do the same trick yourself.  There are even tricks which one could consider to program into the software (make the software follow links to the endpoint - if it is a redirect site, like tinyurl.com, pointing to a blacklisted domain, block the edit .. etc.) 

@Tim - why does the API not do the same as the saving mechanism which checks against the various blacklists/whitelists?  Should have the same speed .. Though batch-lookup would be a good option as well (push a whole page through the parser and see what is blacklisted in XML output).  

@Legoktm: (pff .. WP:BEANS).  You can not avoid that, one could use a locally installed version of the software to do the work for you.  Seen a current recurring case of spam, it may even be that some do that type of things, spammers seem to know how to figure out what is not blacklisted already.

What I think the API should provide is something like 'if I send this link/text-with-links through the parser and would try to save it on XX.wikipedia.org, what blacklist (global and local) and whitelist (local) rules would be matched on it?'
Comment 10 Gerrit Notification Bot 2014-02-12 00:58:11 UTC
Change 85512 merged by jenkins-bot:
Add an API action to test blacklisted URLs

https://gerrit.wikimedia.org/r/85512

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links