Last modified: 2014-11-14 15:47:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T75418, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 73418 - Most raw webrequest partitions for 2014-10-13T20/1H not marked successful


Summary:	Most raw webrequest partitions for 2014-10-13T20/1H not marked successful

Status:	RESOLVED FIXED

Product:	Analytics
Classification:	Unclassified
Component:	Refinery (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	72300
	Show dependency tree / graph

Reported:	2014-11-14 14:55 UTC by christian
Modified:	2014-11-14 15:47 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description christian 2014-11-14 14:55:27 UTC

Three of the webrequest partitions [1] for 2014-10-13T20/1H have been
been marked successful.

What happened?


[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 14:37:13 // exit code: 0
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh 
  +------------------+--------+--------+--------+--------+
  | Date             |  bits  | mobile |  text  | upload |
  +------------------+--------+--------+--------+--------+
[...]
  | 2014-11-13T18/1H |    .   |    .   |    .   |    X   |
  | 2014-11-13T19/1H |    .   |    .   |    .   |    .   |
  | 2014-11-13T20/1H |    X   |    .   |    X   |    X   |
  | 2014-11-13T21/1H |    .   |    .   |    .   |    .   |
  | 2014-11-13T22/1H |    .   |    .   |    .   |    X   |
[...]
  +------------------+--------+--------+--------+--------+


Statuses:

  . --> Partition is ok
  M --> Partition manually marked ok
  X --> Partition is not ok (duplicates, missing, or nulls)

Comment 1 christian 2014-11-14 14:58:31 UTC

The three jobs for 2014-11-13T20/1H were in SUSPENDED state.
Some internal workflows got stuck with exception about RM issues [1].

This nicely matches yesterdays restarting of the resourcemanager after
upgrading the JVMs.
Resuming the 3 jobs did not work, so I killed and restarted them.




[1] JA009 JA009: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1409078537822_77051' doesn't exist in RM.

Comment 2 christian 2014-11-14 15:47:48 UTC

Now the jobs succeeded, and the partitions got marked ok.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links