Last modified: 2014-04-29 17:28:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 28427 - rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate larger files
rewrite quickIsNFCVerify() to use preg_match() with an offset to accommodate ...
Status: NEW
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.18.x
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-05 01:31 UTC by Brion Vibber
Modified: 2014-04-29 17:28 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2011-04-05 01:31:57 UTC
Broken out from bug 28146, which started with a narrower focus which was solved by a narrower fix.

Per notes & patches on that bug, the preg_match_all() in UtfNormal::quickIsNFCVerify uses a lot of memory for mixed ASCII/non-ASCII strings such as one finds in languages using Latin scripts with accented or other non-ASCII letters.

This results in hitting memory limits on largeish input strings, much sooner than we really ought to.

Rewriting the function so that it works through the string in chunks as it's splitting should avoid that huge memory bump, but my initial tests were too slow using preg_match and an offset, and still slowish using preg_replace_callback.

includes/normal/UtfNormalMemStress.php can be used to stress-test this.
Comment 1 Yann Forget 2011-04-06 13:05:39 UTC
I suppose that this error is related to this bug?

PHP fatal error in
/usr/local/apache/common-local/php-1.17/includes/normal/UtfNormal.php line 285: 
Allowed memory size of 125829120 bytes exhausted (tried to allocate 71 bytes)

http://fr.wikisource.org/w/index.php?title=Fichier:Port_-_Dictionnaire_historique,_g%C3%A9ographique_et_biographique_du_Maine-et-Loire,_tome_1.djvu&action=purge

This is a big file: 882 pages (85,71 Mo)
Comment 2 Brion Vibber 2011-04-06 15:48:14 UTC
That'll be another instance of bug 28146 with the djvu text extraction; merging the fix for that to 1.17 and deployment should resolve it.
Comment 3 Quim Gil 2014-04-29 14:54:44 UTC
Hi veteran contributors. Is this problem still valid? Is General/Unknow its best location?

Marking as Lowest, since nobody seems to be working or planning to work on this currently.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links