Last modified: 2014-10-29 17:11:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71854, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 69854 - Raw webrequest partition monitoring did not flag data for 2014-08-18T13:..:.. as valid for text caches


Summary:	Raw webrequest partition monitoring did not flag data for 2014-08-18T13:..:.....

Status:	RESOLVED WONTFIX

Product:	Analytics
Classification:	Unclassified
Component:	Refinery (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	69666
	Show dependency tree / graph

Reported:	2014-08-21 15:52 UTC by christian
Modified:	2014-10-29 17:11 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
kafka-requests-per-second-2014-08-17--2014-08-19 (23.14 KB, image/png) 2014-08-21 15:52 UTC, christian	Details
Add an attachment (proposed patch, testcase, etc.)

Description christian 2014-08-21 15:52:11 UTC

The imported raw webrequests data from text caches for
2014-08-18T13:..:.. at

  hdfs://analytics-hadoop/wmf/data/raw/webrequest/webrequest_text/hourly/2014/08/18/13

was not marked as ok.

Is that valid?
What happened?

Comment 1 christian 2014-08-21 15:52:50 UTC

Created attachment 16256 [details]
kafka-requests-per-second-2014-08-17--2014-08-19

Comment 2 christian 2014-08-21 15:54:15 UTC

Monitoring worked as expected, as the data is missing sequence numbers:

  +-----------------------------+-----------+---------------------+---------------------+
  | Hostname                    | # missing | Start time          | End time            |
  +-----------------------------+-----------+---------------------+---------------------+
  | amssq37.esams.wmnet         |       155 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | amssq47.esams.wmnet         |       125 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | amssq48.esams.wikimedia.org |       149 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | amssq59.esams.wikimedia.org |        74 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  | cp1052.eqiad.wmnet          |        96 | 2014-08-18T13:29:38 | 2014-08-18T13:29:39 |
  | cp4008.ulsfo.wmnet          |       173 | 2014-08-18T13:29:37 | 2014-08-18T13:29:38 |
  +-----------------------------+-----------+---------------------+---------------------+
  | Total                       |       772 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 |
  +-----------------------------+-----------+---------------------+---------------------+

Those hosts are all text caches, but are not limited to a datacenter.

The affect timespan, matches a leader re-election.
See attachment kafka-requests-per-second-2014-08-17--2014-08-19.

There goes kafka's "at least once" guarantee :-D

Comment 3 Andrew Otto 2014-08-21 16:40:55 UTC

Ha, ah yes, ok, if this corresponds with an election, then this makes sense.  The producers themselves have errors in the amount of time it takes for the partition leadership to change.  This shouldn't happen, and is something I need to look into for sure.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links