Last modified: 2010-05-15 15:28:11 UTC
$ php refreshLinks.php wikidb 50 ..../.... 7150 Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 224 bytes) in /Big/Wikipedia/mediawiki-1.3.8/includes/Parser.php on line 787 hmm, that's 64M for Php request, i think it should be enough ;) Then i tried to restart from here , having modfied the .inc to have reporting interval of 1 instead of 50, to find the offending article, and to start at 7150 and it goes .... until a crash later: 17510 17511 Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 17833 bytes) in /Big/Wikipedia/mediawiki-1.3.8/includes/MagicWord.php on line 174 It looks like a memory "leak" in the script ?
could be one of my memory-leak patches not backported from HEAD. I'll take a look over there.
Debian/Sarge athlon 2Ghz 750 MB mysql 4.0 php 4.3 php client : memory_limit = 64 M !!! That should be enough ;) I have written a small bash script, which split the task in several blocks, which can eventually be run simultaneously: For 5000 articles (begin of french cur.DB) it always take 30 min (on a mono CPU machine) : - 1 x 5000 - 1 tube with 10 x 500 article - 40 tube in parralel with 5 x 25 articles - 10 x( 5 *250 ) etc ... It works fine and ALWAYS take the same time, but the memory use vary. -------
I think this have to do with the LinkCache or other kind of caches. One seems to be filled, incrinsingly , but needs a lot of place (this may be a feature, not a bug, if so it should be documented) This cache is probably useless in maintenance, because : - we never do the same thing, - the time needed is always the same: for 1 x 5000 articles, or for 50 consecutive run of 100 articles. That mean the cache is local to the article, but useless for a new one, so in maintenance (at least) this cache could be flushed after each article has been processed. hmm, well ! TODO : find the offending cache and kill it ;) It might take some time, for the moment i understand very little the cache stuff...
This was probably mostly the ever-growing $wgLinkHolders array (bug 1132, now fixed). A smaller (and not data-destructive) leak would have been the cache for Title::newFromText(), which is now capped so it doesn't grow indefinitely in batch operations.