Last modified: 2014-04-29 17:28:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30427, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 28427 - rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate larger files


Summary:	rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate ...

Status:	NEW

Product:	MediaWiki
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	1.18.x
Hardware:	All All

Importance:	Lowest enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	i18n

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-04-05 01:31 UTC by Brion Vibber
Modified:	2014-04-29 17:28 UTC (History)
CC List:	5 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Brion Vibber 2011-04-05 01:31:57 UTC

Broken out from bug 28146, which started with a narrower focus which was solved by a narrower fix.

Per notes & patches on that bug, the preg_match_all() in UtfNormal::quickIsNFCVerify uses a lot of memory for mixed ASCII/non-ASCII strings such as one finds in languages using Latin scripts with accented or other non-ASCII letters.

This results in hitting memory limits on largeish input strings, much sooner than we really ought to.

Rewriting the function so that it works through the string in chunks as it's splitting should avoid that huge memory bump, but my initial tests were too slow using preg_match and an offset, and still slowish using preg_replace_callback.

includes/normal/UtfNormalMemStress.php can be used to stress-test this.

Comment 1 Yann Forget 2011-04-06 13:05:39 UTC

I suppose that this error is related to this bug?

PHP fatal error in
/usr/local/apache/common-local/php-1.17/includes/normal/UtfNormal.php line 285: 
Allowed memory size of 125829120 bytes exhausted (tried to allocate 71 bytes)

http://fr.wikisource.org/w/index.php?title=Fichier:Port_-_Dictionnaire_historique,_g%C3%A9ographique_et_biographique_du_Maine-et-Loire,_tome_1.djvu&action=purge

This is a big file: 882 pages (85,71 Mo)

Comment 2 Brion Vibber 2011-04-06 15:48:14 UTC

That'll be another instance of bug 28146 with the djvu text extraction; merging the fix for that to 1.17 and deployment should resolve it.

Comment 3 Quim Gil 2014-04-29 14:54:44 UTC

Hi veteran contributors. Is this problem still valid? Is General/Unknow its best location?

Marking as Lowest, since nobody seems to be working or planning to work on this currently.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links