Last modified: 2009-03-25 11:37:33 UTC
The attached patch improves the execution speed of AbuseFilterParser::nextToken through a series of small changes. The most significant impact comes from modifying the application of regex to focus on the immediate offset and not look downstream unnecessarily. It also stops radixRegex from giving empty string matches. This patch preserves all current behavior and is transparent to the user. Benchmarking done with function evaluation and variable lookup hacked off, saw a ~20% improvement in the parsing speed for rules after applying this patch.
Created attachment 5958 [details] Faster logic for parser Retrying to upload patch...
The use of substr( $code, $offset ) is slow. Instead, we should be using the /A modifier (which I thought I was, maybe I didn't commit properly).
Calling preg_match with an offset only matches the beginning of the string flag if the offset is actually set to 0. This is annoying behavior, but I think the only way to force a beginning of string match from preg_match is actually to send it a truncated string.
(In reply to comment #3) > Calling preg_match with an offset only matches the beginning of the string flag > if the offset is actually set to 0. This is annoying behavior, but I think the > only way to force a beginning of string match from preg_match is actually to > send it a truncated string. I know that, but as I said in my previous comment, you can use the /A modifier to do what you want. http://au2.php.net/manual/en/reference.pcre.pattern.modifiers.php
Oh, neat. I learned regex in Python, and I'm pretty confident Python doesn't have that flag. By all means, that looks even better.
Done with a similar, but independent patch in r48806.