Last modified: 2014-09-23 20:27:05 UTC
The OCG service on beta is crashing because: Host deployment-graphite.eqiad.wmflabs not found: 3(NXDOMAIN) on deployment-pdf01. Where did it go?
It has been deployed on Sep. 11 16:31 UTC by YuviPanda: Delete deployment-graphite instance. Having statsd/graphite on labs instance did not fit the needs of beta cluster monitoring based on graphite. Instead, a real hardware box has been setup on the labs infrastructure and is maintained by ops. That is much more stronger. One should thus use: labmon1001.eqiad.wmnet 10.64.37.13 If it got broken, the host configuration should be in puppet and/or operations/mediawiki-config.git so it can be properly updated whenever it changes again.
It is, presumably, I just need to hunt down the puppet configuration for it. Mwalker set it up.
Ok, I've changed the statsd configuration from deployment-graphite.eqiad.wmflabs to labmon1001.eqiad.wmnet on both deployment-pdf01 and deployment-pdf02. Fingers crossed.
Seems like it's working now.