Last modified: 2014-06-21 19:57:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48934, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 46934 - Job queue items from 1.20 get lost – need exit strategy for upgrade


Summary:	Job queue items from 1.20 get lost – need exit strategy for upgrade

Status:	REOPENED

Product:	MediaWiki
Classification:	Unclassified
Component:	JobQueue (Other open bugs)
Version:	1.21.x
Hardware:	All All

Importance:	High critical with 1 vote (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://lists.wikimedia.org/pipermail/...
Whiteboard:
Keywords:

Duplicates:	46971 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-04-05 21:53 UTC by Nemo
Modified:	2014-06-21 19:57 UTC (History)
CC List:	10 users (show)

See Also:	60719
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Nemo 2013-04-05 21:53:31 UTC

See http://lists.wikimedia.org/pipermail/mediawiki-l/2013-April/040970.html

I remember that WMF had a similar problem, is there a solution apart from dropping the old jobs?

Comment 1 Bartosz Dziewoński 2013-04-06 20:23:13 UTC

*** Bug 46971 has been marked as a duplicate of this bug. ***

Comment 2 Bartosz Dziewoński 2013-04-06 20:23:56 UTC

Quoting from the duped bug:

(Quoting bug 46971 comment #0)
> After I upgraded from 1.20.3, in my database (MySQL) some
> pre-upgrade jobs have job_random set to 0 and do not seem to be
> picked up -not even when I try to run them by providing their type
> as an option: php runJobs.php --type=replaceText.

Comment 3 Nemo 2013-04-30 12:37:17 UTC

Do we seriously have no way to fix this? Should we just tell people not to upgrade if they have something in the job queue?

Comment 4 Andre Klapper 2013-04-30 15:17:44 UTC

Aaron: Could you comment on this, please?

Comment 5 Andre Klapper 2013-05-14 08:33:13 UTC

Aaron: Could you comment on this please, as the 1.21 tarball release is imminent?

Comment 6 Mark A. Hershberger 2013-05-14 13:02:48 UTC

(In reply to comment #3)
> Do we seriously have no way to fix this? Should we just tell people not to
> upgrade if they have something in the job queue?

I'd like to include something in the installation information telling users to clear their job queue before upgrading, but I don't know a lot about this.  What would be appropriate?

I would like to get this fixed ASAP for a point release.

Comment 7 Tyler Romeo 2013-05-14 13:17:34 UTC

(In reply to comment #6)
> (In reply to comment #3)
> > Do we seriously have no way to fix this? Should we just tell people not to
> > upgrade if they have something in the job queue?
> 
> I'd like to include something in the installation information telling users
> to
> clear their job queue before upgrading, but I don't know a lot about this. 
> What would be appropriate?
> 
> I would like to get this fixed ASAP for a point release.

Running maintenance/runJobs.php should clear the job queue. But depending on how long upgrading takes, one or two jobs might still be lost. Maybe put the wiki into read-only mode once the job queue is cleared?

Comment 8 Mark A. Hershberger 2013-05-14 13:26:07 UTC

Could maintenance/runJobs.php be run with the wiki in read-only mode?

Comment 9 Tyler Romeo 2013-05-14 13:35:20 UTC

(In reply to comment #8)
> Could maintenance/runJobs.php be run with the wiki in read-only mode?

Nope, I meant making it read-only after clearing the job queue. Still not a complete solution, but I can't think of anything else.

Comment 10 Aaron Schulz 2013-05-14 16:32:05 UTC

(In reply to comment #0)
> See http://lists.wikimedia.org/pipermail/mediawiki-l/2013-April/040970.html
> 
> I remember that WMF had a similar problem, is there a solution apart from
> dropping the old jobs?

Did we? The only problem is that they have a harder time getting picked if there always a bunch of other new jobs. In master and REL1_21 I've tried adding a bunch of jobs and setting the token to 0 for all of them, and runJobs.php works just fine. They weren't lost for me.

Comment 11 Aaron Schulz 2013-05-14 16:47:48 UTC

(In reply to comment #10)
> In master and REL1_21 I've tried
> adding
> a bunch of jobs and setting the token to 0 for all of them, and runJobs.php
> works just fine. They weren't lost for me.

I mean job_random of course, not the token.

Comment 12 Nemo 2013-05-14 16:51:13 UTC

(In reply to comment #11)
> I mean job_random of course, not the token.

Ah. :) Well, isn't setting job_random exactly what the user in comment 0 didn't do and should have done?

Comment 13 Aaron Schulz 2013-05-14 18:12:25 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > I mean job_random of course, not the token.
> 
> Ah. :) Well, isn't setting job_random exactly what the user in comment 0
> didn't
> do and should have done?

The complaint was that 0 valued ones didn't work, so I set all mine to that value and they still worked.

Comment 14 Nemo 2013-06-02 18:35:48 UTC

hexmode said this wasn't deemed worth fixing for 1.21 release. I don't know the reasons, but better a partial update than nothing.

Comment 15 Mark A. Hershberger 2013-06-02 18:56:40 UTC

(In reply to comment #14)
> hexmode said this wasn't deemed worth fixing for 1.21 release. I don't know
> the reasons, but better a partial update than nothing.

What I meant is that this isn't going to stop 1.21.0 from being released.  It is still a valid bug that should be fixed at some point.  If we can get it fixed in a 1.21 point release that would be great.

I don't know enough about the problem to fix it, though.

Comment 16 Nemo 2014-02-03 07:51:11 UTC

WikiApiary still had hundreds of lingering jobs since October, till they dropped them from the DB yesterday because it was impossible to run them. We're still waiting for a general solution to the migration problems.
http://lists.thingelstad.com/pipermail/wikiapiary-l/2014-February/000104.html
http://lists.thingelstad.com/pipermail/wikiapiary-l/attachments/20140202/fd838fb7/attachment-0002.png
http://lists.thingelstad.com/pipermail/wikiapiary-l/attachments/20140202/fd838fb7/attachment-0003.png

Comment 17 Chris Koerner 2014-02-08 16:21:36 UTC

I had a similar thing happen. We were running 1.16.3 (yeah!) and upgraded to 1.23. Made a copy of the 1.16.3 database, pointed new 1.23wmf11 installation to new database. Ran update.php, everything is hunky-dory. Notice the next day that job queue is backed up. A few old jobs existed from the day of the upgrade (in 1.16.3). Tried clearing job_token and a few would run. Tried clearing out old pre 1.23 jobs, still not running. showJobs.php spits out 0, while api query and db shows jobs in the queue.

Comment 18 Aaron Schulz 2014-02-08 18:53:51 UTC

(In reply to comment #17)
> I had a similar thing happen. We were running 1.16.3 (yeah!) and upgraded to
> 1.23. Made a copy of the 1.16.3 database, pointed new 1.23wmf11 installation
> to
> new database. Ran update.php, everything is hunky-dory. Notice the next day
> that job queue is backed up. A few old jobs existed from the day of the
> upgrade
> (in 1.16.3). Tried clearing job_token and a few would run. Tried clearing out
> old pre 1.23 jobs, still not running. showJobs.php spits out 0, while api
> query
> and db shows jobs in the queue.

What types of jobs? What to some of the rows look like?

Comment 19 Chris Koerner 2014-02-09 03:04:16 UTC

Here's the job table from one of our wikis (total separate installations, but configured identically)

http://pastebin.com/i4Qatgpa

Comment 20 Chris Koerner 2014-02-09 03:04:42 UTC

Note, this is after I removed some of the older jobs in an attempt to 'kick start' the queue.

Comment 21 Aaron Schulz 2014-02-09 20:58:28 UTC

What does <<php showJobs.php --group>> show? What about <<php showJobs.php --list>>? Does <<php runJobs.php --type refreshLinks>> actually run anything?

Comment 22 Chris Koerner 2014-02-10 15:44:43 UTC

Nothing appears to run.

http://i.imgur.com/eypighU.jpg

Comment 23 Chris Koerner 2014-02-10 22:26:30 UTC

I should note that certain actions on the site, such as modifying a template or running refreshLinks.php appears to not only add new jobs to the queue (as it should) but I can also run "runJobs.php" and a number of jobs will process. I don't see a pattern or commonality in what is run however.

Comment 24 Chris Koerner 2014-02-11 19:26:07 UTC

Another note (and someone tell me to shut up if this isn't proper etiquette) it appears that running php refreshLinks.php will queue up a number of jobs. Running runJobs.php afterward will kick off many jobs (in a queue of 2000, 1500 or so) but does not complete all jobs as a result of refreshLinks.php or any of the other jobs queued.

Comment 25 Aaron Schulz 2014-02-11 19:51:49 UTC

What is $wgJobTypeConf set to? All the jobs you posted had at least 1 attempt. They won't run again until the claim TTL is reached. I don't know what you set that too. By default, jobs the fail are never retried and get deleted after a week. 

You can try using:
$wgJobTypeConf['default']['claimTTL'] = 3600; // 1 hour

...this will let the jobs be retried (after 1 hour of failure).

You can also set:
$wgDebugLogGroups['runJobs'] = "<some path>"

...this will log all jobs run, and may show some failures (fatal errors will not show here though).

Comment 26 Chris Koerner 2014-02-18 20:09:21 UTC

Aaron, can you explain a little more about the claim TTL? I don't recall setting that anywhere.

I'm trying to pinpoint the cause as much as I can without touching my prod environment. We didn't have this issue in test (berating me for not having more sophisticated QA is justified!) and in order to make changes to prod I have many hoops I must jump through now.

One interesting note is that trying to specify the --memory-limit when running runJobs.php throws an error. I'm beginning to think this might be related to available ram. (Server has 2gb, php.ini has 128mb)

php runJobs.php --memory-limit 1024
PHP Fatal error:  Allowed memory size of 262144 bytes exhausted (tried to allocate 122880 bytes) in /var/www/html/w/includes/AutoLoader.php on line 290

Comment 27 Aaron Schulz 2014-02-18 20:30:51 UTC

If the only jobs that linger have job_attempts set to something other than zero, then this is just a problem of failed jobs. You'd want to set wgDebugLogGroups as above to possibly get more insights on why jobs are failing sometimes. 

Jobs can fail for any number of reasons, mostly specific to the code the job classes run. That in itself doesn't indicate any problem with the job queue itself.

Comment 28 Chris Koerner 2014-02-27 16:29:20 UTC

After changing $wgCliPhp to be blank my job queue now runs and clears out jobs without issue.

$wgPhpCli = "";

As discovered in this thread: https://www.mediawiki.org/w/index.php?title=Project:Support_desk&offset=20140227154556&lqt_mustshow=40130#Mediawiki_1.22.2B_Causes_two_of_my_servers_to_hang_indefinitely._37734

Comment 29 Mark A. Hershberger 2014-06-21 19:57:55 UTC

Removing target milestone that was in the past.

If you want this in a specific release, have a good reason AND you are willing to find resources to fix this bug, feel free to change it to something appropriate.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links