Last modified: 2014-10-29 17:09:30 UTC
The bits and upload webrequest partitions [1] for 2014-10-28T19/1H have not been marked successful. What happened? [1] _________________________________________________________________ qchris@stat1002 // jobs: 0 // time: 15:32:08 // exit code: 0 cwd: ~ ~/cluster-scripts/dump_webrequest_status.sh +------------------+--------+--------+--------+--------+ | Date | bits | mobile | text | upload | +------------------+--------+--------+--------+--------+ [...] | 2014-10-28T17/1H | . | . | . | . | | 2014-10-28T18/1H | . | . | . | . | | 2014-10-28T19/1H | X | . | . | X | | 2014-10-28T20/1H | . | . | . | . | | 2014-10-28T21/1H | . | . | . | . | [...] +------------------+--------+--------+--------+--------+ Statuses: . --> Partition is ok M --> Partition manually marked ok X --> Partition is not ok (duplicates, missing, or nulls)
For bits, only cp1056 was affected. The affected period is the second 2014-10-28T19:52:57. For that second, we saw 178 duplicates, no missing log lines. So <<1 second worth of data is affected. For upload, it affected cp1049, cp1051, cp3003, cp3004, cp3006, cp3010, and cp3015. The affected period are the three seconds 2014-10-28T19:52:54/2014-10-28T19:52:57. No duplicates, but ~2K missing lines. Again, <<1 second worth of data is affected. Those affected time periods match the partition leader re-election for bug 72550. So the duplicates for bits, and the missing log lines for upload are expected.