Last modified: 2014-10-21 10:53:42 UTC
For the hour 2014-08-29T20:xx:xx, none [1] of the the four sources' bucket was marked successful. What happened? [1] _________________________________________________________________ qchris@stat1002 // jobs: 0 // time: 11:19:13 // exit code: 0 cwd: ~/cluster-scripts ./dump_webrequest_status.sh +---------------------+--------+--------+--------+--------+ | Date | bits | text | mobile | upload | +---------------------+--------+--------+--------+--------+ [...] | 2014-09-29T18:xx:xx | X | . | . | . | | 2014-09-29T19:xx:xx | X | . | . | . | | 2014-09-29T20:xx:xx | X | X | X | X | | 2014-09-29T21:xx:xx | . | . | . | . | | 2014-09-29T22:xx:xx | . | . | . | . | +---------------------+--------+--------+--------+--------+ Statuses: . --> Partition is ok X --> Partition is not ok (duplicates, missing, or nulls) For the bits failures for 2014-09-29T18:xx:xx, and 2014-09-29T19:xx:xx see bug 71435.
The issue covered each and every cache. For each cache, at some point between 20:30:00 and 20:57:00, partition numbers reset, and the minute before the reset, we see missing lines (no duplicates). This nicely matches merge of resetting queue.buffering.max.ms b62d61a0eda950f202484a4d27972405cb6f124d and the subsequent scattered puppet runs.