Last modified: 2014-04-29 17:28:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30427, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 28427 - rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate larger files
rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate ...
Status: NEW
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.18.x
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-05 01:31 UTC by Brion Vibber
Modified: 2014-04-29 17:28 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2011-04-05 01:31:57 UTC
Broken out from bug 28146, which started with a narrower focus which was solved by a narrower fix.

Per notes & patches on that bug, the preg_match_all() in UtfNormal::quickIsNFCVerify uses a lot of memory for mixed ASCII/non-ASCII strings such as one finds in languages using Latin scripts with accented or other non-ASCII letters.

This results in hitting memory limits on largeish input strings, much sooner than we really ought to.

Rewriting the function so that it works through the string in chunks as it's splitting should avoid that huge memory bump, but my initial tests were too slow using preg_match and an offset, and still slowish using preg_replace_callback.

includes/normal/UtfNormalMemStress.php can be used to stress-test this.
Comment 1 Yann Forget 2011-04-06 13:05:39 UTC
I suppose that this error is related to this bug?

PHP fatal error in
/usr/local/apache/common-local/php-1.17/includes/normal/UtfNormal.php line 285: 
Allowed memory size of 125829120 bytes exhausted (tried to allocate 71 bytes)

http://fr.wikisource.org/w/index.php?title=Fichier:Port_-_Dictionnaire_historique,_g%C3%A9ographique_et_biographique_du_Maine-et-Loire,_tome_1.djvu&action=purge

This is a big file: 882 pages (85,71 Mo)
Comment 2 Brion Vibber 2011-04-06 15:48:14 UTC
That'll be another instance of bug 28146 with the djvu text extraction; merging the fix for that to 1.17 and deployment should resolve it.
Comment 3 Quim Gil 2014-04-29 14:54:44 UTC
Hi veteran contributors. Is this problem still valid? Is General/Unknow its best location?

Marking as Lowest, since nobody seems to be working or planning to work on this currently.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links