Last modified: 2014-09-23 21:32:55 UTC
translatewiki.net suffered degraded service today after changes to Template:Identical caused a huge amount of jobs generated. The issue was that I had 10+ parallel runJobs.php running for many minutes. First I thought that --exclusive should have prevented this, but on further inspection that never stayed in: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/73198 Then my second thought was that --maxtime=50 should prevent this, but it didn't. It looks like $sTime is correctly set at the beginning, but then it is overwritten again for each job, which means it actually functions like "stop if the last job took longer than maxtime": // Run the job... wfProfileIn( __METHOD__ . '-' . get_class( $job ) ); $sTime = microtime( true ); My CRON entry is: * * * * * betawiki nice php /www/translatewiki.net/w/maintenance/runJobs.php --exclusive --maxtime=50 --procs=1 --memory-limit=250M >> /www/translatewiki.net/w/logs/jobqueue 2> /dev/null For now I have disabled the CRON entry and running jobqueue manually until it drains.
Change 161618 had a related patch set uploaded by Aaron Schulz: Fixed --maxtime handling by JobRunner https://gerrit.wikimedia.org/r/161618
Change 161621 had a related patch set uploaded by Aaron Schulz: Fixed --maxtime handling by JobRunner https://gerrit.wikimedia.org/r/161621
Change 161618 had a related patch set uploaded by Nikerabbit: Fixed --maxtime handling by JobRunner https://gerrit.wikimedia.org/r/161618
Change 161618 merged by jenkins-bot: Fixed --maxtime handling by JobRunner https://gerrit.wikimedia.org/r/161618
Change 161626 had a related patch set uploaded by Legoktm: Fixed --maxtime handling by JobRunner https://gerrit.wikimedia.org/r/161626
Marking as fixed, Niklas said it worked on twn. I backported it to REL1_24.
Change 161626 merged by jenkins-bot: Fixed --maxtime handling by JobRunner https://gerrit.wikimedia.org/r/161626
Change 161621 merged by jenkins-bot: Fixed --maxtime handling by JobRunner https://gerrit.wikimedia.org/r/161621