Last modified: 2011-03-13 18:06:03 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16522, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14522 - Antispam filter doesn't filter plaintext rendered URLs
Antispam filter doesn't filter plaintext rendered URLs
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
Spam Blacklist (Other open bugs)
unspecified
All All
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://test.wikipedia.org/w/index.php...
:
: 20501 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-13 03:50 UTC by Danny B.
Modified: 2011-03-13 18:06 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Danny B. 2008-06-13 03:50:03 UTC
If http://some.spam.tld on Spam-blacklist, it still lets out following constructions:

* <nowiki>http://some.spam.tld</nowiki>
* http&#x3a;//some.spam.tld (etc.)
Comment 1 Mormegil 2008-06-19 08:03:33 UTC
Isn’t that a feature? Those are not links, therefore they are not blocked. (And why should they be?)
Comment 2 Daniel Friesen 2008-06-19 09:30:37 UTC
Using the spam blacklist to block plaintext is not a good idea. Many generics are used inside of stuff added to the spam blacklist to stop many of the incoming urls. None of these are valid in urls, however they may consist of valid words in plaintext. If the spam blacklist were to become used for plaintext, then the spamfilter would start acting up everywhere blocking pages which really don't have spam on them.

And quite simply... We already have an extension for blocking plaintext, SpamRegex. The SpamBlacklist is for blocking urls only and is widely editable. SpamRegex is meant for blocking anything, and is more restricted because you can really screw things up if you do things in even the slightest wrong way.

There is no way to block plaintext in the way you want:
1) The SpamBlacklist extension only looks at parser output not the code, because of that if something has not been converted into a link, it does not know about it. Therefore plaintext cannot be blacklisted.
2) For things like www.foo.com, while you may recognize them as a url, there is no feasible way to make the computer understand that. At least, without an unacceptable amount of false positives which will make many valid edits trigger the spamfilter. Not to mention, that places normally use the SpamBlacklist's talkpage to post up spam urls to block, and they do it in plaintext. If plaintext were to become blacklisted, every time someone blacklisted a url, the talkpage for requesting backlists would become uneditable because of the new spamfilter addition.
Comment 3 Splarka 2009-09-04 23:00:05 UTC
*** Bug 20501 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links