Last modified: 2009-02-27 20:51:15 UTC
In 1.14.0 RELEASE-NOTES we see * $wgSpamRegex now matches the edit summary and page move descriptions in addition to body text. I'm sorry, but that's absolutely crazy, reckless, irresponsible. I'm commenting it out in EditPage.php: # Check for spam $match = false; #JIDANNI turning OFF!!: $match = self::matchSpamRegex( $this->summary ); Please consider e.g.,: $wgSpamRegex=array('/^\B$/', This regular expression is what our wiki uses to prevent vicious page blanking. (By the way, if one triggers it, oddly the function that usually shows the user what the problem was doesn't say anything.) Anyway, a blanked page is bad, but a blank comment is fine! Now let's look at another regexp we use on our sites: '/^[^{][[:ascii:]]*$/'); This regular expression means the user's edit must have at least one Chinese character in it, because our wikis are all zh-tw language wikis, and a pure ASCII post is surely spam. However, a quick English, or NULL _summary_ is very common and accepted on our wikis. Anyway, the rash decision to glue 'edit summary', 'page move descriptions' 'body text' together will have users banging down my door saying why are their postings getting rejected now! ***Please let the administrator glue them together if he wishes!: ($wgSpamRegex['edit summary']= $wgSpamRegex['page move descriptions']= $wgSpamRegex['body text'];) Don't arbitrarily glue them all together for us! *** Please instead run each one as a separate test. You (MediaWiki team) can have an array of arrays, and just do something like the PHP version of foreach('edit summary', 'page move description', 'body text' as $bla){ run the matcher of $wgSpamRegex[$bla] on $get->$bla} or however you write it in PHP, which I am poor at. And of course you need three different MediaWiki:Spamprotectiontext now too. And please allow us to set them in LocalSettings.php: $wgSpamProtectionText['body text']= and the other two too. Setting them in MediaWiki:Spamprotectiontext is a big pain when you are making a Wiki Family. By the way, we also have a rule /{{[Cc]\|\d\d\d\.\d{0,3}}}/ that I mention in Spamprotectiontext: Radio frequencies must have at least four digits after the decimal place. What would be neat is if each regexp could have its own optional text that gets printed out. Ah, you might say I should stop complaining and use this mentioned in DefaultSettings.php: * For a complete example, have a look at the SpamBlacklist extension. */ $wgFilterCallback = false; Well I'll have you know that I did look at it, and it is all 100 times overkill and un-understandable gobbledygook, so sorry. It didn't help me one bit. Anyway, I was doing fine until you glued all the tests together. Next time I'll test while your release candidate is fresh. Sorry I only discovered this (glue mess) now. By the way, I also use /<[Aa]/, which stops attempted spam links. This regexp I wish to use in all three places: summary, body text, etc. I.e., I cannot live for long with no summary filtering (caused by my above commenting out), as I know it is only a matter of time before they attack, therefore I hope you will separate the three tests (and not just toss in some var $ignoreEditSummary), by version 1.14.1. Thank you.
P.S. the above example should be '/^[[:ascii:]]*$/'); (No need to show the "[^{]", which is our local ( http://taizhongbus.jidanni.org/index.php?title=Template:B http://radioscanningtw.jidanni.org/index.php?title=Template:C ) jazz, meaning it is OK to not even have one Chinese character, if one is entering a bus stop or police frequency via these templates.)
Or maybe a even fancier array is needed [/REGEXP1/,0,1,0,"No xyz allowed"] [/REGEXP2/,1,1,1,null] ... the 0,1,0 stuff are the three tests, followed by an optional message, which if null, just prints the /REGEXP/ that triggered. I.e., instead of three arrays, which will probably have a lot of duplication, use one array... OK, anything is OK, except the current gluing with no way to unglue short of hacking the source.
Hmmm, [/REGEXP2/,1,1,1,null] doesn't look too expandable for the future with more tests added. Sorry. Maybe [/REGEXP2/,[1,1,1],null] would be better, so if a fourth test was added, older LocalSettings would still work. (By older, I mean older than 1.14.2, but younger than 1.14.0 :-) OK, bye.)
You make some fairly good points. This change ignores some fairly reasonable use-cases for the spam regex.
(In reply to comment #4) > You make some fairly good points. This change ignores some fairly reasonable > use-cases for the spam regex. Thanks. By the way, the patterns I mentioned, and no more, have kept us 100% spam free for years!
Also consider that some regex are not applicable to the summary, and thus is a wasted regex check. IMHO a different regex for summary is the way to go. $wgSummarySpamRegex = $wgSpamRegex; is easy enough for people which like using the same.
Done in r47876