Last modified: 2014-11-19 13:06:05 UTC
6 webrequest partitions [1] for 2014-11-18T19/2H has not been marked successful. What happened? [1] _________________________________________________________________ qchris@stat1002 // jobs: 0 // time: 12:26:16 // exit code: 0 cwd: ~ ~/cluster-scripts/dump_webrequest_status.sh +------------------+--------+--------+--------+--------+ | Date | bits | mobile | text | upload | +------------------+--------+--------+--------+--------+ [...] | 2014-11-18T17/1H | . | . | . | . | | 2014-11-18T18/1H | . | . | . | . | | 2014-11-18T19/1H | X | . | X | . | | 2014-11-18T20/1H | X | X | X | X | | 2014-11-18T21/1H | . | . | . | . | | 2014-11-18T22/1H | . | . | . | . | [...] +------------------+--------+--------+--------+--------+ Statuses: . --> Partition is ok M --> Partition manually marked ok X --> Partition is not ok (duplicates, missing, or nulls)
Merging of the commits cdd19aed010ae5100feb907ddd49b44126465b9f e40cfe942461f78b602c972e75dee9e654d120b2 238f3e1bac5616c17a594efce6f393f7c533df8d 4a97fd3f31211184eca06b9dd6b965bd0feb8d22 4881baa833cf3a9b899f1fcb29d0663740e2350c 8db05f31802ebcbde44c3f11281a4a16da13692c (which implement running multiple varnishkafkas on machines), caused an expected rewrite of the varnishkafka configurations, which correctly caused varnishkafka restarts. Due to the restarts, sequence numbers got reset. This explains why the six partitions were not marked successful, and no data got lost. The only outlier being cp1064, which correctly reset the sequence number, but in addition to that lost a message 2014-11-18T20:05:24, which was half an hour before the sequence number reset and looks unrelated. Not a big deal, but since we're seeing isolated drops more on upload, let's split that out to a separate bug.
(In reply to christian from comment #1) > [...], let's split that out to a separate bug. It's tracked in bug 73609.