Last modified: 2009-03-25 11:37:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20147, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18147 - Faster logic for Abuse Filter parser
Faster logic for Abuse Filter parser
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
AbuseFilter (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Andrew Garrett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-25 07:29 UTC by Robert Rohde
Modified: 2009-03-25 11:37 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Faster logic for parser (4.34 KB, patch)
2009-03-25 07:30 UTC, Robert Rohde
Details

Description Robert Rohde 2009-03-25 07:29:14 UTC
The attached patch improves the execution speed of AbuseFilterParser::nextToken through a series of small changes.

The most significant impact comes from modifying the application of regex to focus on the immediate offset and not look downstream unnecessarily.  It also stops radixRegex from giving empty string matches.

This patch preserves all current behavior and is transparent to the user.

Benchmarking done with function evaluation and variable lookup hacked off, saw a ~20% improvement in the parsing speed for rules after applying this patch.
Comment 1 Robert Rohde 2009-03-25 07:30:08 UTC
Created attachment 5958 [details]
Faster logic for parser

Retrying to upload patch...
Comment 2 Andrew Garrett 2009-03-25 07:33:00 UTC
The use of substr( $code, $offset ) is slow. Instead, we should be using the /A modifier (which I thought I was, maybe I didn't commit properly).
Comment 3 Robert Rohde 2009-03-25 07:52:33 UTC
Calling preg_match with an offset only matches the beginning of the string flag if the offset is actually set to 0.  This is annoying behavior, but I think the only way to force a beginning of string match from preg_match is actually to send it a truncated string.
Comment 4 Andrew Garrett 2009-03-25 07:54:09 UTC
(In reply to comment #3)
> Calling preg_match with an offset only matches the beginning of the string flag
> if the offset is actually set to 0.  This is annoying behavior, but I think the
> only way to force a beginning of string match from preg_match is actually to
> send it a truncated string.

I know that, but as I said in my previous comment, you can use the /A modifier to do what you want.

http://au2.php.net/manual/en/reference.pcre.pattern.modifiers.php
Comment 5 Robert Rohde 2009-03-25 08:01:00 UTC
Oh, neat.  I learned regex in Python, and I'm pretty confident Python doesn't have that flag.  By all means, that looks even better.
Comment 6 Andrew Garrett 2009-03-25 11:37:33 UTC
Done with a similar, but independent patch in r48806.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links