Last modified: 2014-10-31 12:52:30 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T74028, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 72028 - Several raw webrequest partitions now marked successful between 2014-10-13T13:xx:xx and 2014-10-13T22:xx:xx
Several raw webrequest partitions now marked successful between 2014-10-13T13...
Status: NEW
Product: Analytics
Classification: Unclassified
Refinery (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 69667 72809
  Show dependency treegraph
 
Reported: 2014-10-14 11:12 UTC by christian
Modified: 2014-10-31 12:52 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-10-14 11:12:37 UTC
Between 2014-10-13T13:xx:xx and 2014-10-13T22:xx:xx several
partitions, were not marked successful [1]. It seems bits was most
affected, followed by upload and to a lesser extent text and mobile.

What happened?


[1]
_________________________________________________________________
qchris@stat1002 // jobs: 0 // time: 11:07:47 // exit code: 0
cwd: ~
cluster-scripts/dump_webrequest_status.sh 
  +---------------------+--------+--------+--------+--------+
  | Date                |  bits  |  text  | mobile | upload |
  +---------------------+--------+--------+--------+--------+
[...]
  | 2014-10-13T11:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-13T12:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-13T13:xx:xx |    X   |    X   |    X   |    X   |    
  | 2014-10-13T14:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-13T15:xx:xx |    X   |    .   |    .   |    .   |    
  | 2014-10-13T16:xx:xx |    X   |    .   |    .   |    .   |    
  | 2014-10-13T17:xx:xx |    X   |    .   |    .   |    .   |    
  | 2014-10-13T18:xx:xx |    X   |    .   |    .   |    .   |    
  | 2014-10-13T19:xx:xx |    X   |    .   |    .   |    X   |    
  | 2014-10-13T20:xx:xx |    X   |    .   |    .   |    X   |    
  | 2014-10-13T21:xx:xx |    X   |    .   |    .   |    X   |    
  | 2014-10-13T22:xx:xx |    .   |    .   |    .   |    .   |    
  | 2014-10-13T23:xx:xx |    .   |    .   |    .   |    .   |    
[...]
  +---------------------+--------+--------+--------+--------+


Statuses:

  . --> Partition is ok
  X --> Partition is not ok (duplicates, missing, or nulls)

pass cluster-scripts/dump_webrequest_status.sh
Comment 1 christian 2014-10-14 14:09:59 UTC
For 2014-10-13T13:xx:xx it affected all caches with the only exception
of
  cp1056.eqiad.wmnet (bits)
  cp1057.eqiad.wmnet (bits)
  cp3019.esams.wikimedia.org (bits)
  cp3020.esams.wikimedia.org (bits)
(which are exactly the machines that saw the ACK experiments [1],
and we did not see missing log lines for any of them.)

For that hour, we saw no duplicates, but intermittent loss between
2014-10-13T13:37:15 and 2014-10-13T13:38:16 which is worth
  bits <1 second
  text <2 seconds
  mobile <2 seconds
  upload <1 second
.

This nicely matches the dropout of analytics1021 from its partition leader role [2].

I marked the 2014-10-13T13:xx:xx partitions as ok.

[1] https://git.wikimedia.org/blob/operations%2Fpuppet.git/ccc17ce0780f6c56ddcac4f4dcd9f90b2dc0d346/manifests%2Frole%2Fcache.pp#L510
[2] https://bugzilla.wikimedia.org/show_bug.cgi?id=69667#c14
Comment 2 christian 2014-10-15 10:13:07 UTC
The failed partitions between 2014-10-13T15:xx:xx--2014-10-13T21:xx:xx
have all exclusively been esams caches.
Hence, filing under the esams bug.
Comment 3 christian 2014-10-20 12:29:45 UTC
(Since it also is about analytics1021 dropping out of it's leader role,
also blocking on bug 69667)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links