Last modified: 2010-05-31 19:13:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17063, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15063 - wpSpamRegex entry for large image tables
wpSpamRegex entry for large image tables
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Rob Halsell
: shell
: 14811 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-07 00:37 UTC by Luna Santin
Modified: 2010-05-31 19:13 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Luna Santin 2008-08-07 00:37:17 UTC
Vandals have recently been making use of software which produces a "bitmap" using very large tables with colored cell backgrounds. When asked about a wpSpamRegex entry to stop these, Platonides and myself each proposed a regex:

/<TR>(<TD BGCOLOR=["']?#......["']?>\.+<\/TD>){20,}<\/TR>/i
/<table>.+?(<td bgcolor.+?){400,}.+?<\/table>/i

Discussion seemed to favor the top one, but I'm listing both here for completeness.

The first option blocks any image table with over 20 cells in one row (effectively limiting the horizontal resolution), and will match only tables filled with periods ("..." etc). The second option blocks any such table with over 400 cells total (20x20 resolution), without paying attention to text. The current image tables we're trying to prevent are typically 85x9 in size or more, but obviously any match too narrow is easily avoided.

Brion suggested we should file a bug, so here we are.
Comment 1 Brion Vibber 2008-08-07 18:05:42 UTC
*** Bug 14811 has been marked as a duplicate of this bug. ***
Comment 2 Siebrand Mazeland 2008-08-10 23:24:44 UTC
Assigned to brion.
Comment 3 Brion Vibber 2008-08-25 18:31:50 UTC
Applied.
Comment 5 Mike.lifeguard 2009-02-15 17:51:02 UTC
(In reply to comment #4)
> Please have a look at
> http://en.wikipedia.org/w/index.php?title=Wikipedia:Sockpuppet_investigations&curid=18053857&diff=270803120&oldid=270803099
> 

Re-opening the bug so someone can tweak it if possible.
Comment 6 Platonides 2009-02-16 10:26:46 UTC
Three steps made it avoid the second regex:
-Parameters to the table
-It used newlines
-The table is not closed

This new regex can replace the second one, including now those contents.
/<table[^>]*>.+?(<td bgcolor.+?){400,}/is
Comment 7 Siebrand Mazeland 2009-02-16 10:35:22 UTC
Please provide a proper patch. This makes committing this stuff a lot easier.
Comment 8 Siebrand Mazeland 2009-02-16 10:38:38 UTC
(In reply to comment #7)
> Please provide a proper patch. This makes committing this stuff a lot easier.
Ow, not 'commit' in this context, but 'applying to whatever file from http://noc.wikimedia.org/conf/ needs patching'.
Comment 9 Platonides 2009-02-16 14:48:39 UTC
I don't really think it's needed, but here's what you need to do:
       // bug 15063, these won't last:
       '/<TR>(<TD BGCOLOR=["\']?#......["\']?>\.+<\/TD>){20,}<\/TR>/i',
-       '/<table>.+?(<td bgcolor.+?){400,}.+?<\/table>/i',
+       '/<table[^>]*>.+?(<td bgcolor.+?){400,}/is',
       // Weird thingy ....

It's line 5425 of InitialiseSettings.php
Comment 10 Andrew Garrett 2009-03-06 01:41:30 UTC
Will be doable with the Abuse Filter when it's live on the appropriate site(s).
Comment 11 Mike.lifeguard 2009-03-06 02:32:44 UTC
(In reply to comment #10)
> Will be doable with the Abuse Filter when it's live on the appropriate site(s).
> 

That will really require global abuse filter(s).
Comment 12 Chad H. 2009-03-17 03:48:57 UTC
Removing dependency. This is a configuration change request. The other is an enhancement to AbuseFilter.
Comment 13 Platonides 2010-02-06 22:58:02 UTC
Please change the wgSpamRegex line
       '/<TR>(<TD BGCOLOR=["\']?#......["\']?>\.+<\/TD>){20,}<\/TR>/i',
to
       '/<TR>(<TD BGCOLOR=["\']?#......["\']?>(\.+|We|are|Anonymous)<\/TD>){20,}<\/TR>/i',
to also block the new vandalisms like http://en.wikisource.org/w/index.php?title=Template%3ATl&action=historysubmit&diff=1771366&oldid=1771352
Comment 14 Platonides 2010-05-29 17:34:04 UTC
I was coming to request a change to 
'/<TR>(<TD BGCOLOR=["\']?#......["\']?>(\.|We|are|Anonymous| )+<\/TD>){20,}<\/TR>/i'

just to find out that I had requested the same three months ago, which would have prevented http://es.wikipedia.org/w/index.php?title=Plantilla:Portada_Bueno/970&diff=37518654&oldid=37518360
Comment 15 JeLuF 2010-05-31 19:13:13 UTC
Done.
===================================================================
Index: InitialiseSettings.php
===================================================================
--- InitialiseSettings.php	(revision 808)
+++ InitialiseSettings.php	(working copy)
@@ -6596,7 +6596,7 @@
        '/avril\.on\.nimp\.org/i', // http://en.wikipedia.org/wiki/Special:Contributions/Hochitup
        '/\.on\.nimp\.org/i', // per MrZ-man 2008-11-02 -- brion
        // bug 15063, these won't last:
-       '/<TR>(<TD BGCOLOR=["\']?#......["\']?>\.+<\/TD>){20,}<\/TR>/i',
+       '/<TR>(<TD BGCOLOR=["\']?#......["\']?>(\.|We|are|Anonymous|)+<\/TD>){20,}<\/TR>/i',
        '/<table>.+?(<td bgcolor.+?){400,}.+?<\/table>/i',
        // Weird thingy http://en.wikipedia.org/w/index.php?title=Hellboy:_Sword_of_Storms&oldid=245477898&diff=prev
        '/<span onmouseover="_tipon/',

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links