Last modified: 2010-11-21 21:07:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27920, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 25920 - preg_match cannot compile pattern in includes/IP.php
preg_match cannot compile pattern in includes/IP.php
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.17.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-14 14:57 UTC by Paul Oranje
Modified: 2010-11-21 21:07 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Paul Oranje 2010-11-14 14:57:52 UTC
After svn update to r76651 the following warning is shown:

  Warning:  preg_match() [function.preg-match]: Compilation failed: assertion expected after (?( at offset 157 in [my server]/mediawiki-trunk/includes/IP.php on line 85

The actual regex="/^((::|:(:([0-9A-Fa-f]{1,4})){1,7})|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){0,6}::|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){7}|([0-9A-Fa-f]{1,4})(:(?P<abbr>(?(abbr)|:))?([0-9A-Fa-f]{1,4})){1,6}(?(abbr)|^))(\/(12[0-8]|1[01][0-9]|[1-9]?\d)|)$/"

Used software versions:
mediawiki: 1.17alpha (r76658)
PHP: 5.2.8 (but somehow does not accept the syntax "(?<name>)" which should work as of version 5.2.2, see http://php.net/manual/en/function.preg-match.php)
Comment 1 Sam Reed (reedy) 2010-11-15 14:42:35 UTC
CC'ing aaron. I think he might be the one to have broken this (he's been working on it recently)
Comment 2 Aaron Schulz 2010-11-15 14:44:39 UTC
(In reply to comment #0)
> After svn update to r76651 the following warning is shown:
> 
>   Warning:  preg_match() [function.preg-match]: Compilation failed: assertion
> expected after (?( at offset 157 in [my server]/mediawiki-trunk/includes/IP.php
> on line 85
> 
> The actual
> regex="/^((::|:(:([0-9A-Fa-f]{1,4})){1,7})|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){0,6}::|([0-9A-Fa-f]{1,4})(:([0-9A-Fa-f]{1,4})){7}|([0-9A-Fa-f]{1,4})(:(?P<abbr>(?(abbr)|:))?([0-9A-Fa-f]{1,4})){1,6}(?(abbr)|^))(\/(12[0-8]|1[01][0-9]|[1-9]?\d)|)$/"
> 
> Used software versions:
> mediawiki: 1.17alpha (r76658)
> PHP: 5.2.8 (but somehow does not accept the syntax "(?<name>)" which should
> work as of version 5.2.2, see http://php.net/manual/en/function.preg-match.php)

Are you saying there might be a PHP bug?
Comment 3 Aaron Schulz 2010-11-15 14:48:04 UTC
From: http://php.net/manual/en/function.preg-match.php

5.2.2  	 Named subpatterns now accept the syntax (?<name>)  and (?'name') as well as (?P<name>). Previous versions accepted only (?P<name>).

I choose ?P since it was the oldest. It *should* be fine...
Comment 4 Aaron Schulz 2010-11-15 14:53:54 UTC
OK, I think the problem is that the backreference (not just the name) also needs the "P". *sigh of relief*.
Comment 5 Aaron Schulz 2010-11-15 15:21:07 UTC
(In reply to comment #4)
> OK, I think the problem is that the backreference (not just the name) also
> needs the "P". *sigh of relief*.

I do think it's something about the conditional and not the named ref. that 5.2.8 dislikes. Worst case, I can use numbered references as a workaround, but it makes the expression less versatile for outside use (like combining it with other regexes).

The following is also valid syntax (perl style):
define( 'RE_IPV6_ADD',
	'(' . // starts with "::" (includes the address "::")
		'(::|:(:' . RE_IPV6_WORD . '){1,7})' .
	'|' . // ends with "::" (not including the address "::")
		RE_IPV6_WORD . '(:' . RE_IPV6_WORD . '){0,6}::' .
	'|' . // has no "::"
		RE_IPV6_WORD . '(:' . RE_IPV6_WORD . '){7}' .
	'|' . // contains one "::" in the middle ("^" check always fails if no "::" found)
		RE_IPV6_WORD . '(:(?P<abbr>(?(<abbr>)|:))?' . RE_IPV6_WORD . '){1,6}(?(<abbr>)|^)' .
	')'
);

Does this work on your setup?
Comment 6 Paul Oranje 2010-11-15 15:42:52 UTC
(In reply to comment #2)
> (In reply to comment #0)
>>[cut]
> > PHP: 5.2.8 (but somehow does not accept the syntax "(?<name>)" which should
> > work as of version 5.2.2, see http://php.net/manual/en/function.preg-match.php)
> 
> Are you saying there might be a PHP bug?
No, but the new named pattern syntax does not work with hosting provider,
though the reported version of PHP > 5.2.2. So I thought the same could be true
for the backreferences.


(In reply to comment #5)
A test with as part of the pattern set set on line 45 like proposed by you
  RE_IPV6_WORD . '(:(?P<abbr>(?(<abbr>)|:))?' . RE_IPV6_WORD . '){1,6}(?(<abbr>)|^)' .

produces the error message:
  Compilation failed: assertion expected after (?( at offset 157 in [my
server]/mediawiki-trunk/includes/IP.php on line 85


An other test with as part of the pattern set set on line 45 like 
  RE_IPV6_WORD . '(:(?P<abbr>(?P(abbr)|:))?' . RE_IPV6_WORD .
'){1,6}(?P(abbr)|^)'

produces the error message:
  Compilation failed: unrecognized character after (?P at offset 157 in [my
server]/mediawiki-trunk/includes/IP.php on line 85


My knowledge of PHP is superficial, so likely the other pattern I tested isn't right.
What would be the correct pattern (for the back reference "?(abbr)")?
It seems to me a good choice that the oldest still supported syntax be used.
Comment 7 Aaron Schulz 2010-11-15 16:15:35 UTC
Before PHP 5.3, PCRE was an extension. The host might be using an old version.

From: http://www.pcre.org/changelog.txt
...Version 7.0 19-Dec-06...
    (d) A conditional reference to a named group can now use the syntax
        (?(<name>) or (?('name') as well as (?(name).
Comment 8 Aaron Schulz 2010-11-15 16:29:02 UTC
It looks like (?(name)x|y) was actually the oldest named group assertion syntax, from "Version 4.0 17-Feb-03".
Comment 9 Aaron Schulz 2010-11-15 17:06:06 UTC
Do you know what version of PCRE the host uses as the PHP extension?
Comment 10 Paul Oranje 2010-11-16 09:24:31 UTC
Version of PCRE according to phpinfo() is "6.6 06-Feb-2006".
Comment 11 Aaron Schulz 2010-11-16 09:34:30 UTC
(In reply to comment #8)
> It looks like (?(name)x|y) was actually the oldest named group assertion
> syntax, from "Version 4.0 17-Feb-03".

I found another log entry in the changelog. It shows that named subpatterns in conditionals were added in 6.7:

...Version 6.7 04-Jul-06...
13. Added the ability to use a named substring as a condition, using the
    Python syntax: (?(name)yes|no). This overloads (?(R)... and names that
    are numbers (not recommended). Forward references are permitted.

They must have been left out in 4.0 then. This now makes sense.
Comment 12 Aaron Schulz 2010-11-17 09:13:34 UTC
OK, I changed the regex in r76876. It now avoids conditionals on whether a named group matched.

This should be fine on 4.0+, does this give you any trouble?
Comment 13 Paul Oranje 2010-11-17 10:57:51 UTC
Sorry, but it doesn't work yet.

The error msg:
  Compilation failed: reference to non-existent subpattern at offset 163 in [myserver]/mediawiki-trunk/includes/IP.php on line 92.
Comment 14 Aaron Schulz 2010-11-17 17:28:08 UTC
(In reply to comment #13)
> Sorry, but it doesn't work yet.
> 
> The error msg:
>   Compilation failed: reference to non-existent subpattern at offset 163 in
> [myserver]/mediawiki-trunk/includes/IP.php on line 92.

"6.7...14. Added forward references in named backreferences (if you see what I mean)."

[Sigh]...I guess named refs were really half-assed before 6.7. Maybe there is some weird workaround.
Comment 15 Aaron Schulz 2010-11-17 21:11:50 UTC
(In reply to comment #13)
> Sorry, but it doesn't work yet.
> 
> The error msg:
>   Compilation failed: reference to non-existent subpattern at offset 163 in
> [myserver]/mediawiki-trunk/includes/IP.php on line 92.

Here is another variant. The forward reference is now a nested reference (nested refs supported for non-named groups since 2.0). If named groups support it too, then this should work:

define( 'RE_IPV6_ADD',
	'(?:' . // starts with "::" (includes the address "::")
		'::|:(?::' . RE_IPV6_WORD . '){1,7}' .
	'|' . // ends with "::" (not including the address "::")
		RE_IPV6_WORD . '(?::' . RE_IPV6_WORD . '){0,6}::' .
	'|' . // has no "::"
		RE_IPV6_WORD . '(?::' . RE_IPV6_WORD . '){7}' .
	'|' . // contains one "::" in the middle (awkward regex for PCRE 4.0+ compatibility)
		RE_IPV6_WORD . '(?::(?P<abn>(?!(?P=abn)):(?P<iabn>))?' . RE_IPV6_WORD . '){1,6}(?P=iabn)' .
		// NOTE: (?!(?P=abn)) fails iff "::" used twice; (?P=iabn) passes iff a "::" was found.
		
		// Better regexp (PCRE 7.2+ only), allows intuitive regex concatenation
		#RE_IPV6_WORD . '(?::((?(-1)|:))?' . RE_IPV6_WORD . '){1,6}(?(-2)|^)' .
	')'
);
Comment 16 Paul Oranje 2010-11-17 22:01:04 UTC
Your last variant does not produce errors.

Whether the regex actually works cannot be tested by me - my hosting provider does not have IPv6 connectivity.

p.s.
This has been an interesting bug sofar in grasping the development of PCRE, tou qualify as historian. Good luck.
Comment 17 Aaron Schulz 2010-11-17 22:10:20 UTC
(In reply to comment #16)
> Your last variant does not produce errors.
> 
> Whether the regex actually works cannot be tested by me - my hosting provider
> does not have IPv6 connectivity.
> 
> p.s.
> This has been an interesting bug sofar in grasping the development of PCRE, tou
> qualify as historian. Good luck.

You can run tests as follows:
WINDOWS: execute maintenance/tests/phpunit/run-tests.bat
UNIX: run 'make test' from a shell in the maintenance/tests/phpunit directory

All the stuff I was proposing above passed the tests for me.
Comment 18 Aaron Schulz 2010-11-17 22:10:49 UTC
Of course, phpunit has to be installed first :)
Comment 19 Paul Oranje 2010-11-17 22:26:46 UTC
Besides the above requirement, the makefile contains a severe warning that the tests should not be run on a production system. For that reason I shall not run the test. Sorry.

When you can instruct me on a non-destructive and sufficiently restrictive test suitable for this bug, I'll be happy to consider running such a test.
Comment 20 Aaron Schulz 2010-11-17 23:04:04 UTC
(In reply to comment #19)
> Besides the above requirement, the makefile contains a severe warning that the
> tests should not be run on a production system. For that reason I shall not run
> the test. Sorry.
> 
> When you can instruct me on a non-destructive and sufficiently restrictive test
> suitable for this bug, I'll be happy to consider running such a test.

Are you running /trunk on a production system? I just figured it was a test wiki.

I don't know of a clean way to run IPTests.php by itself. I suppose one could edit the file to have require("MW DIR/includes/IP.php") at the top and then run "phpunit IPTests" from shell.

In either case, you've been MORE than helpful enough here :)... I'll commit the last regexp.
Comment 21 Aaron Schulz 2010-11-18 00:27:31 UTC
Committed in r76928.
Comment 22 Aaron Schulz 2010-11-18 02:53:07 UTC
(In reply to comment #21)
> Committed in r76928.

Reverted by Sam. Apparently there is a PCRE 8.02 segfault bug (I'm running 7.9). Yay PCRE!
Comment 23 Aaron Schulz 2010-11-21 10:50:34 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > Committed in r76928.
> 
> Reverted by Sam. Apparently there is a PCRE 8.02 segfault bug (I'm running
> 7.9). Yay PCRE!

PCRE bug filed upstream. Workaround done in r77067.
Comment 24 Paul Oranje 2010-11-21 21:07:43 UTC
The workaround works also with my old PCRE.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links