Last modified: 2014-03-03 20:47:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17582, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15582 - URLs should trigger the blacklist regardless of parser functions
URLs should trigger the blacklist regardless of parser functions
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Spam Blacklist (Other open bugs)
unspecified
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://nl.wikipedia.org/w/index.php?t...
:
: 16354 16610 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-12 20:25 UTC by Mike.lifeguard
Modified: 2014-03-03 20:47 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Mike.lifeguard 2008-09-12 20:25:37 UTC
By using {{#time: (as in this example) or possibly other parserfunctions, one may circumvent the blacklist. I'm not sure how to get this sort of thing blocked, but this should be worked out.

Also in the URL field: http://nl.wikipedia.org/w/index.php?title=Gebruiker:Emil76&diff=prev&oldid=13856552
Comment 1 Mike.lifeguard 2008-09-12 20:26:23 UTC
I guess we're not using tracking bugs for this any longer.
Comment 2 Brion Vibber 2008-09-12 20:32:47 UTC
The basic problem here is that we have to determine not whether a URL exists in the page now, but whether under any circumstances of possible input data it _could_. That quickly becomes difficult or impossible if you move from this case to slightly less trivial ones, including already-known-possible cases involving transclusion of multiple pieces edited at different times.

The only real solution to that I can think of is to apply the blacklist at rendering time for *views* as well as at *edit* -- matching links could be de-linked or removed and the page marked on a queue for review.

This probably wouldn't perform terribly well, but could perhaps be optimized.

Don't know if it's worth the effort.
Comment 3 Mike.lifeguard 2008-09-12 20:54:04 UTC
(In reply to comment #2)
> The basic problem here is that we have to determine not whether a URL exists in
> the page now, but whether under any circumstances of possible input data it
> _could_. That quickly becomes difficult or impossible if you move from this
> case to slightly less trivial ones, including already-known-possible cases
> involving transclusion of multiple pieces edited at different times.

Sure, but can't this slightly-less-exotic case be covered will less trouble?


> The only real solution to that I can think of is to apply the blacklist at
> rendering time for *views* as well as at *edit* -- matching links could be
> de-linked or removed and the page marked on a queue for review.
> 
I think filtering on view might be worth doing, perhaps with a notice "This page has spam that we've automatically hidden from your sensitive eyes, please help clean it up. You're looking for the domain spam.org -> [edit]" - especially useful now that saving isn't blocked when the domain already existed in the page (bug 1505). (however a queue seems like overkill)
Comment 4 Mike.lifeguard 2008-09-12 20:54:54 UTC
(In reply to comment #3)

> Sure, but can't this slightly-less-exotic case be covered will less trouble?

 *WITH less trouble
Comment 5 Mike.lifeguard 2008-11-15 23:06:41 UTC
*** Bug 16354 has been marked as a duplicate of this bug. ***
Comment 6 FireJackey 2008-11-16 03:25:00 UTC
(In reply to comment #5)
> *** Bug 16354 has been marked as a duplicate of this bug. ***
> 

thanks
Comment 7 Ilmari Karonen 2008-12-10 16:17:26 UTC
A minimal fix, to stop this from being attractive to vandals, would be to simply silently ignore any blacklisted URLs unexpectedly encountered during a parse.  I wouldn't (naively, perhaps) expect this to cause too much load; surely our page caching should be good enough that pages don't get needlessly reparsed very often?
Comment 8 John Mark Vandenberg 2011-04-28 14:59:13 UTC
*** Bug 16610 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links