Last modified: 2014-02-12 08:32:27 UTC
At translatewiki.net we use SemanticMediaWiki. For a while, we've tried to rebuild all the semantic data using the feature "Data repair and upgrade" on Special:SMWAdmin. It always stopped updating at 50.01%. Here's our cron job: * * * * * betawiki nice php /www/translatewiki.net/w/maintenance/runJobs.php --exclusive --maxtime=50 --procs=1 --memory-limit=250M >> /www/translatewiki.net/w/logs/jobqueue 2> /dev/null I think it stops working around 100% completion of SMW\RefreshJob. 2013-12-26 13:49:59 SMW\RefreshJob Special:SMWAdmin spos=3639900 prog=0.99999917580221 rc=2 run=1 t=5 good I've now disabled the queue running via cron, and am running it in interactive mode. It now get beyond the point where it used to stop.
Any idea what the relevant difference between interactive and cron could be? Also, do you know if this is a new issue, or has it been present for a longer time?
(In reply to comment #1) > Any idea what the relevant difference between interactive and cron could be? > Also, do you know if this is a new issue, or has it been present for a longer > time? I couldn't say... We don't rebuild the semantic data that often. I don't recall any issues with it previously, but I couldn't say if that was 6 months or 12 months ago since it succeeded last. It could be an unnoticed memory limit issue. I had this while running the job queue as interactive. This one was logged to our error log file (also relayed to #mediawiki-i18n on Freenode). I'd assume that the cron job queue process running out of memory would have resulted in similar logging. 2013-12-26 15:54:09 SMW\RefreshJob Special:SMWAdmin spos=1243721 prog=0.34167539819333 rc=2 run=2 t=34 good [a0b5f4f8] [no req] Exception from line 256 of /www/translatewiki.net/w/maintenance/runJobs.php: Detected excessive memory usage (149444096/157286400). Backtrace: #0 /www/translatewiki.net/w/maintenance/runJobs.php(156): RunJobs->assertMemoryOK() #1 /www/translatewiki.net/w/maintenance/doMaintenance.php(119): RunJobs->execute() #2 /www/translatewiki.net/w/maintenance/runJobs.php(271): require_once(string) #3 {main} We have this setting: LocalSettings.php:ini_set( 'memory_limit', '175M' ); If I recall correctly, the commandline has it's own (hardcoded) memory limit somewhere.
Memory limit issue seems plausible. Guess the actionable item would be cleaning up our job and maintenance script code, adding tests, and making the whole thing more robust and error tolerant.
(In reply to comment #3) > Memory limit issue seems plausible. Guess the actionable item would be > cleaning up our job and maintenance script code, adding tests, and > making the whole thing more robust and error tolerant. That seems to be the easy way out. Is that what you expect every SMW wiki admin to do? Do you think that's a reasonable expectation? If you think so, please close this issue as INVALID.
Working on getting "Data repair and upgrade to complete for almost 24 hours now. I increased max memory from the 150M hard coded in runJobs.php to 250M. It failed fairly quickly. 2013-12-27 08:41:12 SMW\UpdateJob MediaWiki:Betafeatures-enable-all-desc/mr t=46 good [341087c9] [no req] Exception from line 256 of /www/translatewiki.net/w/maintenance/runJobs.php: Detected excessive memory usage (249095784/262144000). Backtrace: #0 /www/translatewiki.net/w/maintenance/runJobs.php(156): RunJobs->assertMemoryOK() #1 /www/translatewiki.net/w/maintenance/doMaintenance.php(119): RunJobs->execute() #2 /www/translatewiki.net/w/maintenance/runJobs.php(271): require_once(string) #3 {main} It's running with 350M now. It looks like some of the SMW jobs may be using a huge amount of memory at times.
translatewiki.net has 3.538.755 pages. My current theory is that creating one job for each of these pages, as is some somewhere in this process, requires over 250M of memory, and causes the process to fail.
(In reply to comment #4) > That seems to be the easy way out. Is that what you expect every SMW wiki > admin > to do? Do you think that's a reasonable expectation? If you think so, please > close this issue as INVALID. You misunderstand. These are action items for the devs. And this is not easy, it is quite some work.
I've had jobs exceed 550MB of memory now. I have no idea why job it is, or how to reproduce it. Once the job queue is empty (1421969 to go), I'll restart the process again, with a single queue runner, so I'll have more detail. If you want any additional debug information in, please submit a patch, let me know which it is, and I'll run the update with that code. Expecting to be able to restart the run tomorrow morning (CET) if it doesn't fail during the night.
(In reply to comment #8) > I've had jobs exceed 550MB of memory now. That's probably due to bug 60844 (part of the series on the catastrophic 1.22 changes to job queue <https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22>). Siebrand managed to bring the script to completion with brute force and hacks which made it skip uninteresting namespaces, but nobody has been able to work on SMW issues in the last month. We currently thing they're unrelated to this bug, I wrote to the mailing list <http://sourceforge.net/mailarchive/forum.php?forum_name=semediawiki-user&max_rows=25&style=ultimate&viewmonth=201402> (ten minutes and it's not in archives yes, see <http://p.defau.lt/?0iJEtsTkjCpwDIWF1UivQQ>).
> <http://sourceforge.net/mailarchive/forum.php?forum_name=semediawiki- > user&max_rows=25&style=ultimate&viewmonth=201402> > (ten minutes and it's not in archives yes, see > <http://p.defau.lt/?0iJEtsTkjCpwDIWF1UivQQ>). See [0]. [0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/184
(In reply to comment #10) > https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/184 Thanks for following up. That issue is now solved. As for this bug, yesterday I've started the refresh from SMWAdmin and Nikerabbit run jobs manually (through hhvm), it's now reaching completion (99.5 %) even though the memory raise hack on runJobs.php has been removed. (A null-editing bot on the affected pages is much faster, though.) This bug is solved for us, we can't help with debugging any longer; close it if you don't see actionable items.