Last modified: 2014-10-29 17:11:43 UTC
The imported raw webrequests data from text caches for 2014-08-18T13:..:.. at hdfs://analytics-hadoop/wmf/data/raw/webrequest/webrequest_text/hourly/2014/08/18/13 was not marked as ok. Is that valid? What happened?
Created attachment 16256 [details] kafka-requests-per-second-2014-08-17--2014-08-19
Monitoring worked as expected, as the data is missing sequence numbers: +-----------------------------+-----------+---------------------+---------------------+ | Hostname | # missing | Start time | End time | +-----------------------------+-----------+---------------------+---------------------+ | amssq37.esams.wmnet | 155 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 | | amssq47.esams.wmnet | 125 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 | | amssq48.esams.wikimedia.org | 149 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 | | amssq59.esams.wikimedia.org | 74 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 | | cp1052.eqiad.wmnet | 96 | 2014-08-18T13:29:38 | 2014-08-18T13:29:39 | | cp4008.ulsfo.wmnet | 173 | 2014-08-18T13:29:37 | 2014-08-18T13:29:38 | +-----------------------------+-----------+---------------------+---------------------+ | Total | 772 | 2014-08-18T13:29:37 | 2014-08-18T13:29:39 | +-----------------------------+-----------+---------------------+---------------------+ Those hosts are all text caches, but are not limited to a datacenter. The affect timespan, matches a leader re-election. See attachment kafka-requests-per-second-2014-08-17--2014-08-19. There goes kafka's "at least once" guarantee :-D
Ha, ah yes, ok, if this corresponds with an election, then this makes sense. The producers themselves have errors in the amount of time it takes for the partition leadership to change. This shouldn't happen, and is something I need to look into for sure.