Last modified: 2014-04-29 17:28:35 UTC
Broken out from bug 28146, which started with a narrower focus which was solved by a narrower fix. Per notes & patches on that bug, the preg_match_all() in UtfNormal::quickIsNFCVerify uses a lot of memory for mixed ASCII/non-ASCII strings such as one finds in languages using Latin scripts with accented or other non-ASCII letters. This results in hitting memory limits on largeish input strings, much sooner than we really ought to. Rewriting the function so that it works through the string in chunks as it's splitting should avoid that huge memory bump, but my initial tests were too slow using preg_match and an offset, and still slowish using preg_replace_callback. includes/normal/UtfNormalMemStress.php can be used to stress-test this.
I suppose that this error is related to this bug? PHP fatal error in /usr/local/apache/common-local/php-1.17/includes/normal/UtfNormal.php line 285: Allowed memory size of 125829120 bytes exhausted (tried to allocate 71 bytes) http://fr.wikisource.org/w/index.php?title=Fichier:Port_-_Dictionnaire_historique,_g%C3%A9ographique_et_biographique_du_Maine-et-Loire,_tome_1.djvu&action=purge This is a big file: 882 pages (85,71 Mo)
That'll be another instance of bug 28146 with the djvu text extraction; merging the fix for that to 1.17 and deployment should resolve it.
Hi veteran contributors. Is this problem still valid? Is General/Unknow its best location? Marking as Lowest, since nobody seems to be working or planning to work on this currently.