Last modified: 2009-03-25 11:37:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20147, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 18147 - Faster logic for Abuse Filter parser


Summary:	Faster logic for Abuse Filter parser

Status:	RESOLVED FIXED

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	AbuseFilter (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Andrew Garrett

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2009-03-25 07:29 UTC by Robert Rohde
Modified:	2009-03-25 11:37 UTC (History)
CC List:	1 user (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Faster logic for parser (4.34 KB, patch) 2009-03-25 07:30 UTC, Robert Rohde	Details
Add an attachment (proposed patch, testcase, etc.)

Description Robert Rohde 2009-03-25 07:29:14 UTC

The attached patch improves the execution speed of AbuseFilterParser::nextToken through a series of small changes.

The most significant impact comes from modifying the application of regex to focus on the immediate offset and not look downstream unnecessarily.  It also stops radixRegex from giving empty string matches.

This patch preserves all current behavior and is transparent to the user.

Benchmarking done with function evaluation and variable lookup hacked off, saw a ~20% improvement in the parsing speed for rules after applying this patch.

Comment 1 Robert Rohde 2009-03-25 07:30:08 UTC

Created attachment 5958 [details]
Faster logic for parser

Retrying to upload patch...

Comment 2 Andrew Garrett 2009-03-25 07:33:00 UTC

The use of substr( $code, $offset ) is slow. Instead, we should be using the /A modifier (which I thought I was, maybe I didn't commit properly).

Comment 3 Robert Rohde 2009-03-25 07:52:33 UTC

Calling preg_match with an offset only matches the beginning of the string flag if the offset is actually set to 0.  This is annoying behavior, but I think the only way to force a beginning of string match from preg_match is actually to send it a truncated string.

Comment 4 Andrew Garrett 2009-03-25 07:54:09 UTC

(In reply to comment #3)
> Calling preg_match with an offset only matches the beginning of the string flag
> if the offset is actually set to 0.  This is annoying behavior, but I think the
> only way to force a beginning of string match from preg_match is actually to
> send it a truncated string.

I know that, but as I said in my previous comment, you can use the /A modifier to do what you want.

http://au2.php.net/manual/en/reference.pcre.pattern.modifiers.php

Comment 5 Robert Rohde 2009-03-25 08:01:00 UTC

Oh, neat.  I learned regex in Python, and I'm pretty confident Python doesn't have that flag.  By all means, that looks even better.

Comment 6 Andrew Garrett 2009-03-25 11:37:33 UTC

Done with a similar, but independent patch in r48806.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links