Last modified: 2011-03-13 18:06:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 5496 - unicode blacklist check in User.php fails in php 4.3.2
unicode blacklist check in User.php fails in php 4.3.2
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
User login and signup (Other open bugs)
1.6.x
PC Linux
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-07 18:45 UTC by Don Seiler
Modified: 2011-03-13 18:06 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Don Seiler 2006-04-07 18:45:44 UTC
Installed mediawiki 1.6.1 on server with PHP 4.3.2.  When trying to log in, I
get this PHP warning:

Warning: Compilation failed: characters with values > 255 are not yet supported
in classes at offset 33 in /usr/local/mediawiki-1.6.1/includes/User.php on line 224

I comment out the attempt check against the $unicodeBlacklist and the error goes
away.  Earlier today I was told that this only works in PHP >= 4.4, but now am
told it *should* work in 4.3 as well.

I was going to write a patch to check the PHP version and conditionally do the
unicode check if PHP >= 4.4, but I'd like to know for sure that it won't work in
4.3 first.

I'm rizzo on freenode.
Comment 1 Don Seiler 2006-04-07 18:51:43 UTC
Googling around I found mention that the version of PCRE might have something to
do with it as well.  The post I read is
http://drupal.org/node/12857#comment-35282, it turns out we have the same 4.3.2
with the same PCRE version as the poster.

phpinfo() says:  PCRE Library Version 	3.9 02-Jan-2002
Comment 2 Brion Vibber 2006-04-07 19:08:09 UTC
It *should* work fine on 4.3.2...

http://us3.php.net/manual/en/reference.pcre.pattern.modifiers.php

"u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings 
are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on 
win32. UTF-8 validity of the pattern is checked since PHP 4.3.5."
Comment 3 Don Seiler 2006-04-07 19:58:55 UTC
Looks like it works after updating our pcre and recompiling the same version of PHP.

Perhaps mediawiki should make notice of a required minimum version of pcre?
Comment 4 Rob Church 2006-04-16 19:40:41 UTC
(In reply to comment #3)
> Looks like it works after updating our pcre and recompiling the same version
of PHP.
> 
> Perhaps mediawiki should make notice of a required minimum version of pcre?

What *was* the apparent minimum version of PCRE required?
Comment 5 Daniel Kinzler 2006-04-17 15:59:13 UTC
I can confirm the problem, running PHP 4.3.8 with pcre-3.9-10; Avar told me he
does NOT have the problem running 4.3.2 with  pcre-3.9-10.2;

So, the .2 seems to make all the difference, pcre-3.9-10.2 being the required
minimum version (since that's a patch release, required versions may have to be
determined for 3.9-11, etc, separately).

Please put something about this into the release notes!
Comment 6 Don Seiler 2006-04-17 16:11:51 UTC
(In reply to comment #4)
> What *was* the apparent minimum version of PCRE required?

pcre-3.9-10.2
Comment 7 Daniel Kinzler 2006-04-17 16:22:38 UTC
The blacklist can be rewritten like this:

		$unicodeBlacklist = '/' .
			'(\x00[\x80-\x9f])|' . # iso-8859-1 control chars
			'(\x00\xa0)|' .        # non-breaking space
			'(\x20[\x00-\x0f])|' . # various whitespace
			'(\x20[\x28-\x2f])|' . # breaks and control chars
			'(\x30\x00)|' .        # ideographic space
			'([\xe0-\xf8].)' .     # private use
			'/u';

that's not as nice, and I did not test it much, but it should work with earlier
versions of PCRE too. Please consider using this version, since the current one
is bound to cause problems for people running stuff on boxes with an old version
of PCRE - getting your hosting service to upgrade a library is not an easy task
in my experience. Speed and nicety is not critical for this expression.
Comment 8 Brion Vibber 2006-04-17 21:19:31 UTC
That would fail as it only concerns itself with ASCII characters 
and a few Latin Extended characters.

Daniel: where does this updated version of PCRE come from?
Did you just run your distro's standard updater or did you
get it from somewhere else?
Comment 9 Daniel Kinzler 2006-04-18 14:23:33 UTC
Actually, I did not try the new pcre version, i just relied on what avar said.
Reading your comment, i updated pcre and got version 4.4-1, which does not have
the problem. I'm using apt-for-rpm on an old mostly-fedora-but-really-redhat
box, so i'm not really representative. I expect recent versions of pcre do not
have the problem, but people who have webspace on a "debian stable only" box or
something may be stuck with something old. I just think it's a bit pointles to
require a new version of pcre just for this simple check.

You said that my rewritten expression "only concerns itself with ASCII
characters and a few Latin Extended characters" - well, as far as I can see, it
does *exactly* the same as the expression currently in SVN:

		$unicodeBlacklist = '/[' .
			'\x{0080}-\x{009f}' . # iso-8859-1 control chars
			'\x{00a0}' .          # non-breaking space
			'\x{2000}-\x{200f}' . # various whitespace
			'\x{2028}-\x{202f}' . # breaks and control chars
			'\x{3000}' .          # ideographic space
			'\x{e000}-\x{f8ff}' . # private use
			']/u';

Am I missing something?
Comment 10 Brion Vibber 2006-04-18 21:01:33 UTC
Yes, you're missing two hexadecimal digits. [\xe0-\xf8] refers to
characters in the range U+00E0 (à) through U+00F8 (ø).

Have you tested with Debian stable (that's 3.1, not old 3.0 which
has too old a PHP to run 1.6 anyway)?
Comment 11 Daniel Kinzler 2006-04-18 21:10:03 UTC
I don't quite understand what you are saying, but looking at this again, I
notice I once more managed to ignore the difference between Unicode and UTF-8.
Oops...

Anyway: no, i have not tested with debian stable, I pulled that out of thin air.
Basically, since I still has that old version on my (ill maintained) box, others
will probably too - that's all.  It's not a real issue to me.
Comment 12 Antoine "hashar" Musso (WMF) 2006-04-30 10:48:02 UTC
php 4.3.2 support dropped in REL1_6:

------------------------------------------------------------------------
r13843 | tstarling | 2006-04-24 17:30:28 +0200 (lun, 24 avr 2006) | 1 line

We no longer support PHP 4.3.2, thanks to the unicode character classes in
User::isValidUserName(). Support for this was added in 4.3.3.
------------------------------------------------------------------------

I believe Tim added that note following your bug report. Please upgrade
your php4.x version. Debian stable got 4.3.10.
Comment 13 Daniel Kinzler 2006-04-30 12:16:26 UTC
"Support for this was added in 4.3.3" - this is wrong. Support for this was
added in pcre-3.9-10.2, the version of PHP is not relevant (it was broken for me
in 4.3.8). Please update the notes accordingly.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links