Last modified: 2009-03-26 04:48:54 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20165, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18165 - Inefficiency of creating multiple parsers for a single rule set
Inefficiency of creating multiple parsers for a single rule set
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
AbuseFilter (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Andrew Garrett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-26 01:15 UTC by Robert Rohde
Modified: 2009-03-26 04:48 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Robert Rohde 2009-03-26 01:15:50 UTC
The way the code currently works its way through the rule set has an unnecessary major inefficiency.

For each rule, AbuseFilter::checkAllFilters calls AbuseFilter::checkConditions with the $conds and $vars.

$vars is of course the spiffy variable class preloaded with a combination of already fetched and buildable data.

checkConditions then creates a new AbuseFilterParserClass and loads it with $vars.

The problem here is that a new parser is being created for every rule evaluation.  As currently implemented that means the subsequent rules aren't getting the benefit of the work done by prior rules.  In particular, the funcCache is being cleared and the variable holder reset to its initial state. 

What you want is for a single parser instance to be used for the processing of the entire rule set, that way the evaluation of functions and loading of variables is cached the whole way through, and not just within a single rule.
Comment 1 Andrew Garrett 2009-03-26 01:18:59 UTC
(In reply to comment #0)
> The problem here is that a new parser is being created for every rule
> evaluation.  As currently implemented that means the subsequent rules aren't
> getting the benefit of the work done by prior rules.  In particular, the
> funcCache is being cleared and the variable holder reset to its initial state. 

The function cache is a static variable, and therefore not cleared.

The variable holder is not being reset to its original state. It's an object, so it's passed by reference, so you'd need specific code to reset it to its original state.
Comment 2 Robert Rohde 2009-03-26 03:13:24 UTC
(In reply to comment #1)
> The function cache is a static variable, and therefore not cleared.

By declaring it static you create a globally accessible variable AbuseFilterParser::$funcCache, that has one value, but every instance of the class has it's own instantiation of $funcCache that is separate from the global variable and separate from each other.

So strictly speaking you didn't clear it, rather by calling new you created a new one that is blank.

Test this yourself.  $funcCache is not retained across rules.
 

> The variable holder is not being reset to its original state. It's an object,
> so it's passed by reference, so you'd need specific code to reset it to its
> original state.

You have such code.  More precisely, each time AbuseFilterParser is created it in turn creates a new AbuseFilterVariableHolder and then calls addHolder to copy the parameters passed to it in setVars.  Those parameters consist of an array of AFPData and AFPComputedVariable objects.  During the course of operation you replace the AFPComputedVariable objects which AFPData objects as you fetch the values.

The problem here is that because you copied the array into a local holder, and didn't simply attach the original VariableHolder by reference, that means that the data array in the original VariableHolder is unaware of the actions occuring in the local copy.  Specifically, it doesn't know that you replaced ComputedVariables with Data.

So each time you create a new parser you end up passing it the original VariableHolder that still contains the unprocessed ComputedVariables.


The end result is that both the $funcCache and the ComputedVariables are being lost upon the creation of a new AbuseFilterParser.
Comment 3 Andrew Garrett 2009-03-26 04:09:28 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > The function cache is a static variable, and therefore not cleared.
> 
> By declaring it static you create a globally accessible variable
> AbuseFilterParser::$funcCache, that has one value, but every instance of the
> class has it's own instantiation of $funcCache that is separate from the global
> variable and separate from each other.
> 
> So strictly speaking you didn't clear it, rather by calling new you created a
> new one that is blank.
> 
> Test this yourself.  $funcCache is not retained across rules.

I did:

> $afp = new AbuseFilterParser();

> $afp->evaluateExpression( 'ccnorm("FooBarBaz")' )

> print_r( AbuseFilterParser::$funcCache );
Array
(
    [8437e92172f797748a5f891b6952c110] => AFPData Object
        (
            [type] => string
            [data] => F00BARBAZ
        )

)

> $afp = new AbuseFilterParser();

Seems to work for me.

> > The variable holder is not being reset to its original state. It's an object,
> > so it's passed by reference, so you'd need specific code to reset it to its
> > original state.
> 
> You have such code.  More precisely, each time AbuseFilterParser is created it
> in turn creates a new AbuseFilterVariableHolder and then calls addHolder to
> copy the parameters passed to it in setVars.  Those parameters consist of an
> array of AFPData and AFPComputedVariable objects.  During the course of
> operation you replace the AFPComputedVariable objects which AFPData objects as
> you fetch the values.
> 
> The problem here is that because you copied the array into a local holder, and
> didn't simply attach the original VariableHolder by reference, that means that
> the data array in the original VariableHolder is unaware of the actions
> occuring in the local copy.  Specifically, it doesn't know that you replaced
> ComputedVariables with Data.
> 
> So each time you create a new parser you end up passing it the original
> VariableHolder that still contains the unprocessed ComputedVariables.

You're right here, and I've made the Parser a lazy-initialised static variable in r48855.
Comment 4 Robert Rohde 2009-03-26 04:36:21 UTC
(In reply to comment #3)
 
> You're right here, and I've made the Parser a lazy-initialised static variable
> in r48855.
> 

Doesn't that break batch testing?  You want the variables to carry over when working with a single edit, but you need them to be different when each check corresponds to a different edit.
Comment 5 Andrew Garrett 2009-03-26 04:45:57 UTC
(In reply to comment #4)
> (In reply to comment #3)
> 
> > You're right here, and I've made the Parser a lazy-initialised static variable
> > in r48855.
> > 
> 
> Doesn't that break batch testing?  You want the variables to carry over when
> working with a single edit, but you need them to be different when each check
> corresponds to a different edit.
> 

Should be sorted out in r48856. Note that that commit included a bunch of unrelated global filters work, which I reverted in r48857.
Comment 6 Robert Rohde 2009-03-26 04:48:54 UTC
(In reply to comment #3)
> I did:
> Seems to work for me.

Okay, works for me too.

$afp->funcCache and self::$funcCache are distinct variables, and somewhere in performance testing I got them crossed.  new does clear $afp->funcCache but not the global self::$funcCache which is what you are correctly using.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links