Last modified: 2012-11-04 16:43:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16719, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14719 - Rename Spam blacklist to "Disallowed websites"
Rename Spam blacklist to "Disallowed websites"
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Spam Blacklist (Other open bugs)
unspecified
All All
: Low normal with 13 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: SWMT
  Show dependency treegraph
 
Reported: 2008-07-03 22:47 UTC by Mike.lifeguard
Modified: 2012-11-04 16:43 UTC (History)
15 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Mike.lifeguard 2008-07-03 22:47:50 UTC
Discussion in many places, overwhelmingly in favour. For example, on otrs-en-l starting here: https://lists.wikimedia.org/mailman/private/otrs-en-l/2008-June/004293.html

Or earlier: https://lists.wikimedia.org/mailman/htdig/otrs-en-l/2007-April/000893.html
Comment 1 Herby 2008-07-04 09:35:23 UTC
This really is something that is overdue.  It causes considerable annoyance to
those who are listed which in turn means that the volunteers frequently have to
put up with abuse.

Ultimately these list are not about "spam".  They are about links which the
community deem excessive/unnecessary.  This (& the Mediawiki equivalent pages)
should be changed as soon as possible.

Thanks
Comment 2 Dirk Beetstra 2008-07-04 09:59:15 UTC
I would also urge the renaming to take place ASAP.  Thanks.
Comment 3 Mike.lifeguard 2008-07-04 23:29:11 UTC
Yes, this was intended to apply to the local and global blacklists.

With regards to the global blacklist on meta, I think this should be done in a way that wikis using our blacklist may still do so. The configuration 
	$wgSpamBlacklistFiles = array(
		"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list
	);
should still work even though the real page is located somewhere else.

This may also be done in such a way that multiple blacklists can be collected together such that if bug 14322 is implemented, then wikis using the meta blacklist can still get it with the above configuration.

This would allow backward-compatibility (important primarily for third-party users), even though the list may actually be located in a different location, or several different locations.
Comment 4 Herby 2008-07-09 11:54:03 UTC
For a view on the impact this has on those whose sites are listed & therefore on the volunteers who have to deal with it please see this. http://meta.wikimedia.org/w/index.php?title=Talk%3ASpam_blacklist&diff=1077313&oldid=1076452

It would seem wise to deal with this as soon as possible.  Thanks
Comment 5 Mike.lifeguard 2008-07-10 11:01:27 UTC
"Disallowed websites" was suggested as a new name, and seems acceptable to all, however either of "URL exclusion list" or "URL Blacklist" would still be acceptable.
Comment 6 Guy Chapman 2008-07-12 12:35:23 UTC
+1 for this.  I am an OTRS and enWiki admin, plus an OTRS volunteer.  There are numerous cases where sites are justly and entirely uncontroversially added to the blacklist to control abuse, but where doing so must necessarily label the site as spam.  Some scenarios:

* A site owner is not responsible for the spamming of his site, but the site has been massively and inappropriately linked.  The site owner asserts that he is not a spammer, but the site meets our local definition of spam.
* URL shorteners, legitimate and useful sites, nonetheless tagged as spam because we cannot restrict their use to circumvent the blacklist without tagging them as spam.
* Sites which systematically violate copyright and are locally or meta blacklisted to control legal liability.  Many different users may be adding these sites, they are not being spammed per the normal external definition but blacklisting is essentially uncontroversial.

The list needs to be renamed. I understand that there is resistance on the basis that this might imply a change in policy, but the reality is that policy and consensus already allow for blacklisting of sites which may not meet the usual real-world definition of spam, even if it meets our own internal definitions.  Nobody is proposing any change to the criteria, only the removal of a word which carries stigma in order to head off a steady stream of complaints from site owners.  ~~~~
Comment 7 Cary Bass 2008-07-12 14:38:01 UTC
According to Brion (sitting beside in Heathrow Airport), renaming it is not the most important change, although it is important.  he will be addressing this soon, but probably not until after Wikimania :)
Comment 8 Jimmy Wales 2008-07-12 15:25:44 UTC
+1 This needs to be fixed.  Decisions about what links to include in Wikipedia are rightfully in the hands of the community, and we need to be able to use this list in a way that is consistent with our policies.  Calling something "spam" is not something that many community members are comfortable doing, even when something is spam, because we try not to engage in that kind of personal attack.  Additionally, as outlined by Guy Chapman above, we use this to cover cases that meet our local definition of spam, but which might not be viewed as spam by the website itself.  Insulting them while also taking away their web traffic from Wikipedia is not right.
Comment 9 RoyFocker 2008-07-12 17:47:17 UTC
I think the name itself will clear out many of the conflicts we have. It also allows to show the real meaning of this page in such a way that even sysops can understand much better what can be added or not.
Comment 10 Avi 2008-07-13 03:46:47 UTC
As an enwiki admin and an OTRS volunteer, I also concur with the rename. I am partial to "Disallowed websites" option for both irrelevant personal reasons (see: http://meta.wikimedia.org/w/index.php?title=Talk%3ASpam_blacklist&diff=1079547&oldid=1079546 ) as well as the more important reasons listed above by Guy and Jimbo.
Comment 11 Cometstyles 2008-07-13 10:03:37 UTC
I hate the name though since in the long run it will sound a bit silly "URL Blacklist" or "Excluded URL list" might be better though the word "spam" really needs to be removed....
Comment 12 Siebrand Mazeland 2008-08-13 11:57:22 UTC
Per comment 7 assigned to brion.

As for some other comments: your point is clear. Please use the 'vote' option in bugzilla, if this could replace a "+1".
Comment 13 Mike.lifeguard 2008-09-29 18:07:50 UTC
This would be superseded by bug 4459/bug 13811 - using a special page for the blacklist rather than a wiki page.
Comment 14 Stifle 2010-04-05 13:48:59 UTC
Bump. This issue hasn't gone away in the last 18 months; any chance of progress?
Comment 15 Brion Vibber 2010-04-05 17:03:23 UTC
IMO this request has always been a 'can't see the forest for the trees' thing. The primary purpose of a link blacklist always was, and always will be, to reduce link spam activity by preventing linking to sites known to have been used in link spam.

It probably makes a lot more sense to step back and think about what this thing is for and how it works.

What's actually the problem? I think it's simply poor communication: a certain fraction of sites that are being blacklisted are edge cases where folks are trying to prevent some very particular kind of abuse, but there's no good way to explain to an editor how the blacklist entry got there or whether how they're making use of the link is actually related to that abuse pattern or not.

Changing the name doesn't solve that in any way. It'll be just as frustrating when the link you thought was just fine is on a "disallowed website list" with a poor audit trail that's very hard to get out of as when it was on a "spam blacklist" with a poor audit trail that's very hard to get out of.


I'd recommend ripping out the current "giant list of regexes" and use some actual data structures to record the blacklist entries, as we do for the more heavyweight but flexible AbuseFilter.

This brings several clear benefits:

* Information about the origin and history of each blacklist entry will be available:
 - when was it blocked and by whom? who can I talk to about getting it undone?
 - what was their reasoning? do other people agree with it?
 - does the particular issue that triggered the ban still apply? if we can see what it was, we might be able to find out and get it resolved!

* the ability to treat different cases differently:

 - Legitimate URL redirectors don't need to be disallowed entirely... Redirectors are a common part of today's web ecosystem, and continuing to ban them is just laziness that hurts our users.

The 'engineer's concern' that if we only paid attention to the final redirect target, an evil site could evade a blacklist by changing its redirect targets is tractable by 1) checking both original and target URLs and 2) *marking known good and known abusive redirector sites*. Why should we blacklist every bit.ly or whatever URL when we know they're consistent and a lookup of the redirect won't magically change to a spam/virus link?

 - Sites that are blacklisted for abusive/annoying/legal issues during a particular event or in a particular area can actually be marked with details about the event or area. A short-term issue probably doesn't need a permanent block.

 - A hard block isn't always really needed; marking pages for review when a slightly-sketchy or sometimes-rude-and-attackish link gets added is probably nicer on everyone than just preventing linking and requiring an administrator escalation to resolve a legitimate case.

and of course:

* an actual user interface for creating and testing entries will reduce administrator errors that accidentally blacklist the wrong sites.

Editing a giant page of regexes is just asking for trouble, let's be honest. It's fragile and easy to break -- while we are able to detect that a regex doesn't compile and skip it, a regex that compiles but matches things you didn't think it would can be even more disruptive.
Comment 16 Mike.lifeguard 2010-04-05 18:34:51 UTC
(In reply to comment #15)
> I'd recommend ripping out the current "giant list of regexes" and use some
> actual data structures to record the blacklist entries, as we do for the more
> heavyweight but flexible AbuseFilter.
> ...
> Editing a giant page of regexes is just asking for trouble, let's be honest.
> It's fragile and easy to break -- while we are able to detect that a regex
> doesn't compile and skip it, a regex that compiles but matches things you
> didn't think it would can be even more disruptive.

Some thoughts about requirements are available at http://www.mediawiki.org/wiki/Regex-based_blacklist as well. Thank god someone is taking this seriously.

Should this be closed as a WONTFIX and point to bug 16717 & bug 4459 for resolving the larger problems here?
Comment 17 Guy Chapman 2010-04-05 19:57:49 UTC
(In reply to comment #15)
> IMO this request has always been a 'can't see the forest for the trees' thing.
> The primary purpose of a link blacklist always was, and always will be, to
> reduce link spam activity by preventing linking to sites known to have been
> used in link spam.

This is true up to a point, however :

> Changing the name doesn't solve that in any way. It'll be just as frustrating
> when the link you thought was just fine is on a "disallowed website list" with
> a poor audit trail that's very hard to get out of as when it was on a "spam
> blacklist" with a poor audit trail that's very hard to get out of.

It won't change the behaviour but it will remove one source of complaints. Spamming has a particular and unwholesome meaning. What we call link spamming, which is unambiguously abusive, is not the same as spamming (sending unsolicited email) and may in fact be the result of actions by someone other than the owner of a given domain. So the *name* of the blacklist is inherently an issue.

Some of the complaints are of course vexatious, but not all. And yes, not seeing the list in clear text would be a partial fix but the discussions of the issue will still be under "spam-foo" (which, incidentally, we really ought to fix since it requires no technical change).
 
On the subject of redirectors, some sites allow the redirection to be changed. There's every reason not to use redirectors within our projects, not least because when someone hovers over a link they should see the domain they are going to.

I completely agree about the technical issues of the blacklist interface, though.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links