Last modified: 2012-12-03 15:41:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43656, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41656 - JobQueue not working: no jobs run except for high-priority ones like enotif
JobQueue not working: no jobs run except for high-priority ones like enotif
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
JobQueue (Other open bugs)
1.21.x
All All
: Highest blocker (vote)
: ---
Assigned To: Aaron Schulz
https://commons.wikimedia.org/wiki/Fi...
:
Depends on:
Blocks: 39480
  Show dependency treegraph
 
Reported: 2012-11-01 19:48 UTC by Raimond Spekking
Modified: 2012-12-03 15:41 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Raimond Spekking 2012-11-01 19:48:23 UTC
On Translatewiki.net we are using the ReplaceText extension for mass changes of message content or page moves of MediaWiki messages when their name/key changed.

Currently at least page moves are not added to the JobQueue anymore and therefor not executed.
Comment 1 Nemo 2012-11-05 07:37:48 UTC
It works now after the code was updated, so I think it was the same as the complete breakage of the job queue on 1.21wmf2/1.21wmf3, fixed by Tim and Aaron with some investigations by Reedy, Ariel and others.
Cf. commits I92f538f4, I257d6809, I98533ed5, I326b767c, Iaea96ff8, If612f8e2, I09b3faa7, Ie2b4abab.
Comment 2 Raimond Spekking 2012-11-05 18:21:13 UTC
Yesterday it worked on translatewiki.net, today it does not work :-(
Comment 3 Rob Lanphier 2012-11-06 00:03:46 UTC
Raimond, do you all have a stack trace or any debugging hints?  We did coincidentally have a problem with our job queue today, which turned out to be a problem with our Puppet scripts and not with the MediaWiki core version of the job queue code.  So, this seems to be operating fine as of 1.21wmf3.
Comment 4 Niklas Laxström 2012-11-06 17:37:41 UTC
There were no notices or backtraces or anything.
Comment 5 Rob Lanphier 2012-11-06 18:42:40 UTC
You may need to manually run the job runner to get debugging information.  The WMF cluster almost certainly has different parameters and wrapper scripts than what you're running, but you may want to look at http://bots.wmflabs.org/~petrb/logs/%23wikimedia-tech/20121031.txt starting at [21:34:28] for an example of us debugging this.
Comment 6 Raimond Spekking 2012-11-07 19:33:47 UTC
I tried a mass message move again a few minutes ago. It seems that the replaces are not added to the job table.

I take a peek in the table bw_job on translatewiki and in the logs of error_php and job.

Manually running with "b php runJobs.php" has no effect too.
Comment 7 Rob Lanphier 2012-11-07 19:43:46 UTC
Assigning to Aaron and cc'ing Reedy.  We may just have to pay special attention to the test2/mediawiki.org job queues on Monday (Tuesday?...possible delay due to Veterans Day here in the US).
Comment 8 Andre Klapper 2012-11-13 13:32:53 UTC
(In reply to comment #7)
> Assigning to Aaron and cc'ing Reedy.  We may just have to pay special attention
> to the test2/mediawiki.org job queues on Monday

1.21wmf4 was deployed yesterday on test2/mw.org - any updates?
Comment 9 Yaron Koren 2012-11-13 13:43:49 UTC
I may have some insight into this: yesterday I was trying out my Data Transfer extension, which uses jobs, on my wiki with MW 1.21alpha, and discovered (like Raimond) that it wasn't adding anything to the "job" table. I looked into it, and found that the issue was the call to $dbw->onTransactionIdle() in the method JobQueueDB::doBatchPush(). Everything inside that set of code was never called. When I commented out the onTransactionIdle() line (and its closing tag about 30 lines down), everything worked perfectly again.

By the way, I have PHP 5.3.10 and MySQL 5.5.22 on that server.
Comment 10 Aaron Schulz 2012-11-14 20:04:23 UTC
See https://gerrit.wikimedia.org/r/#/c/33411/
Comment 11 Nemo 2012-11-15 07:30:38 UTC
Raising priority because the buggish code has been deployed to most projects yesterday.
The number of queue jobs and job runners activity has halved starting a few minutes after the deployment.
https://gdash.wikimedia.org/dashboards/jobq/deploys
http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Jobrunners+pmtpa&m=cpu_report&s=by+name&mc=2&g=network_report
Comment 12 Ariel T. Glenn 2012-11-15 08:16:13 UTC
on a chosen random jobrunner, running

root@mw16:/usr/local/apache/common-local/multiversion# php MWScript.php nextJobDB.php --wiki=aawiki

gives the empty string, although eg enwiki has 18313 refreshLinks2 in the table, some from Nov 5.

running 

echo 'print_r ( $wgMemc->get( "jobqueue:dbs:v3" ) );' | php MWScript.php eval.php  --wiki=aawiki

gives output

Array
(
    [refreshLinks2] => Array
        (
            [0] => afwiki
            [1] => alswiki
            [2] => anwiki
            [3] => arwiki
...
    [createPdfThumbnailsJob] => Array
        (
            [0] => sqwiki
        )

)


There's no 'pendingDBs' key in there anywhere.

Here's the relevant code in nextJobDB.php:

		$pendingDbInfo = $wgMemc->get( $memcKey );
		if ( !$pendingDbInfo || mt_rand( 0, 100 ) == 0 ) {
                  ... (regenerate 1/100 of the time)
                }
		if ( !$pendingDbInfo || !$pendingDbInfo['pendingDBs'] ) {
			return; // no DBs with jobs or cache is both empty and locked
		}
		$pendingDBs = $pendingDbInfo['pendingDBs'];

So that's going to return empty-handed every time.

Those refs to 'pendingDBs' are the Nov 2 change


I guess that we get some jobs run 1/100 of the time when we regenerate the memcache entry.
Comment 13 Yaron Koren 2012-11-15 16:37:12 UTC
I just tried out Aaron's patch (https://gerrit.wikimedia.org/r/#/c/33411/) and it fixed this problem on my wiki.
Comment 14 Aaron Schulz 2012-11-16 03:37:29 UTC
(In reply to comment #12)
> on a chosen random jobrunner, running
> 
> root@mw16:/usr/local/apache/common-local/multiversion# php MWScript.php
> nextJobDB.php --wiki=aawiki
> 
> gives the empty string, although eg enwiki has 18313 refreshLinks2 in the
> table, some from Nov 5.
> 

https://gerrit.wikimedia.org/r/#/c/33537/
Comment 15 Nemo 2012-11-27 22:47:31 UTC
(In reply to comment #14)
> https://gerrit.wikimedia.org/r/#/c/33537/

(That's been merged a while ago, and now also I90911083 .)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links