Last modified: 2014-02-12 08:32:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60969, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58969 - "Data repair and upgrade" not completing using non-interactive job queue runner
"Data repair and upgrade" not completing using non-interactive job queue runner
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
master
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 39480
  Show dependency treegraph
 
Reported: 2013-12-26 13:59 UTC by Siebrand Mazeland
Modified: 2014-02-12 08:32 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Siebrand Mazeland 2013-12-26 13:59:05 UTC
At translatewiki.net we use SemanticMediaWiki. For a while, we've tried to rebuild all the semantic data using the feature "Data repair and upgrade" on Special:SMWAdmin. It always stopped updating at 50.01%.

Here's our cron job:

* * * * * betawiki nice php /www/translatewiki.net/w/maintenance/runJobs.php --exclusive --maxtime=50 --procs=1 --memory-limit=250M >> /www/translatewiki.net/w/logs/jobqueue 2> /dev/null

I think it stops working around 100% completion of SMW\RefreshJob.
2013-12-26 13:49:59 SMW\RefreshJob Special:SMWAdmin spos=3639900 prog=0.99999917580221 rc=2 run=1 t=5 good

I've now disabled the queue running via cron, and am running it in interactive mode. It now get beyond the point where it used to stop.
Comment 1 Jeroen De Dauw 2013-12-26 19:11:55 UTC
Any idea what the relevant difference between interactive and cron could be? Also, do you know if this is a new issue, or has it been present for a longer time?
Comment 2 Siebrand Mazeland 2013-12-26 20:11:45 UTC
(In reply to comment #1)
> Any idea what the relevant difference between interactive and cron could be?
> Also, do you know if this is a new issue, or has it been present for a longer
> time?

I couldn't say... We don't rebuild the semantic data that often. I don't recall any issues with it previously, but I couldn't say if that was 6 months or 12 months ago since it succeeded last.

It could be an unnoticed memory limit issue. I had this while running the job queue as interactive. This one was logged to our error log file (also relayed to #mediawiki-i18n on Freenode). I'd assume that the cron job queue process running out of memory would have resulted in similar logging.

2013-12-26 15:54:09 SMW\RefreshJob Special:SMWAdmin spos=1243721 prog=0.34167539819333 rc=2 run=2 t=34 good
[a0b5f4f8] [no req]   Exception from line 256 of /www/translatewiki.net/w/maintenance/runJobs.php: Detected excessive memory usage (149444096/157286400).
Backtrace:
#0 /www/translatewiki.net/w/maintenance/runJobs.php(156): RunJobs->assertMemoryOK()
#1 /www/translatewiki.net/w/maintenance/doMaintenance.php(119): RunJobs->execute()
#2 /www/translatewiki.net/w/maintenance/runJobs.php(271): require_once(string)
#3 {main}

We have this setting:

LocalSettings.php:ini_set( 'memory_limit',      '175M' );

If I recall correctly, the commandline has it's own (hardcoded) memory limit somewhere.
Comment 3 Jeroen De Dauw 2013-12-26 20:24:12 UTC
Memory limit issue seems plausible. Guess the actionable item would be cleaning up our job and maintenance script code, adding tests, and making the whole thing more robust and error tolerant.
Comment 4 Siebrand Mazeland 2013-12-26 20:31:56 UTC
(In reply to comment #3)
> Memory limit issue seems plausible. Guess the actionable item would be
> cleaning up our job and maintenance script code, adding tests, and
> making the whole thing more robust and error tolerant.

That seems to be the easy way out. Is that what you expect every SMW wiki admin to do? Do you think that's a reasonable expectation? If you think so, please close this issue as INVALID.
Comment 5 Siebrand Mazeland 2013-12-27 08:45:08 UTC
Working on getting "Data repair and upgrade to complete for almost 24 hours now.

I increased max memory from the 150M hard coded in runJobs.php to 250M. It failed fairly quickly.

2013-12-27 08:41:12 SMW\UpdateJob MediaWiki:Betafeatures-enable-all-desc/mr t=46 good
[341087c9] [no req]   Exception from line 256 of /www/translatewiki.net/w/maintenance/runJobs.php: Detected excessive memory usage (249095784/262144000).
Backtrace:
#0 /www/translatewiki.net/w/maintenance/runJobs.php(156): RunJobs->assertMemoryOK()
#1 /www/translatewiki.net/w/maintenance/doMaintenance.php(119): RunJobs->execute()
#2 /www/translatewiki.net/w/maintenance/runJobs.php(271): require_once(string)
#3 {main}


It's running with 350M now.

It looks like some of the SMW jobs may be using a huge amount of memory at times.
Comment 6 Siebrand Mazeland 2013-12-27 09:14:43 UTC
translatewiki.net has 3.538.755 pages. My current theory is that creating one job for each of these pages, as is some somewhere in this process, requires over 250M of memory, and causes the process to fail.
Comment 7 Jeroen De Dauw 2013-12-27 17:21:49 UTC
(In reply to comment #4)
> That seems to be the easy way out. Is that what you expect every SMW wiki
> admin
> to do? Do you think that's a reasonable expectation? If you think so, please
> close this issue as INVALID.

You misunderstand. These are action items for the devs. And this is not easy, it is quite some work.
Comment 8 Siebrand Mazeland 2013-12-27 17:50:51 UTC
I've had jobs exceed 550MB of memory now. I have no idea why job it is, or how to reproduce it. Once the job queue is empty (1421969 to go), I'll restart the process again, with a single queue runner, so I'll have more detail.

If you want any additional debug information in, please submit a patch, let me know which it is, and I'll run the update with that code. Expecting to be able to restart the run tomorrow morning (CET) if it doesn't fail during the night.
Comment 9 Nemo 2014-02-10 09:19:09 UTC
(In reply to comment #8)
> I've had jobs exceed 550MB of memory now.

That's probably due to bug 60844 (part of the series on the catastrophic 1.22 changes to job queue <https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22>).

Siebrand managed to bring the script to completion with brute force and hacks which made it skip uninteresting namespaces, but nobody has been able to work on SMW issues in the last month. We currently thing they're unrelated to this bug, I wrote to the mailing list <http://sourceforge.net/mailarchive/forum.php?forum_name=semediawiki-user&max_rows=25&style=ultimate&viewmonth=201402>  (ten minutes and it's not in archives yes, see <http://p.defau.lt/?0iJEtsTkjCpwDIWF1UivQQ>).
Comment 10 MWJames 2014-02-10 09:31:30 UTC
> <http://sourceforge.net/mailarchive/forum.php?forum_name=semediawiki-
> user&max_rows=25&style=ultimate&viewmonth=201402>
>  (ten minutes and it's not in archives yes, see
> <http://p.defau.lt/?0iJEtsTkjCpwDIWF1UivQQ>).

See [0].

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/184
Comment 11 Nemo 2014-02-12 08:32:27 UTC
(In reply to comment #10)
> https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/184

Thanks for following up. That issue is now solved.

As for this bug, yesterday I've started the refresh from SMWAdmin and Nikerabbit run jobs manually (through hhvm), it's now reaching completion (99.5 %) even though the memory raise hack on runJobs.php has been removed. (A null-editing bot on the affected pages is much faster, though.) This bug is solved for us, we can't help with debugging any longer; close it if you don't see actionable items.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links