Last modified: 2014-03-13 13:09:40 UTC
Move Elasticsearch "search groups" monitoring from cluster level to node level. The advantage here is that we'd be able to see which server is actually doing the work. Ganglia would still add them together. Right now ganglia adds them together but that doesn't show anything useful because it just multiplies the number by the number of nodes.
This is fixed in Elasticsearch 1.0.
The monitoring we're doing now is actually showing up on the hot threads from time to time: 30.9% (154.2ms out of 500ms) cpu usage by thread 'elasticsearch[elastic1012][management][T#3]' 7/10 snapshots sharing following 10 elements org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.getIndices(IndicesStatsResponse.java:87) org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.toXContent(IndicesStatsResponse.java:156) org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:311) org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:303) org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.finishHim(TransportBroadcastOperationAction.java:321) org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.onOperation(TransportBroadcastOperationAction.java:273) org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:225) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:724) 2/10 snapshots sharing following 18 elements org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator._writeFieldName(UTF8JsonGenerator.java:270) org.elasticsearch.common.jackson.core.json.UTF8JsonGenerator.writeFieldName(UTF8JsonGenerator.java:249) org.elasticsearch.common.xcontent.json.JsonXContentGenerator.writeFieldName(JsonXContentGenerator.java:86) org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:242) org.elasticsearch.common.xcontent.XContentBuilder.field(XContentBuilder.java:409) org.elasticsearch.common.xcontent.XContentBuilder.timeValueField(XContentBuilder.java:857) org.elasticsearch.index.search.stats.SearchStats$Stats.toXContent(SearchStats.java:140) org.elasticsearch.index.search.stats.SearchStats.toXContent(SearchStats.java:205) org.elasticsearch.action.admin.indices.stats.CommonStats.toXContent(CommonStats.java:555) org.elasticsearch.action.admin.indices.stats.IndicesStatsResponse.toXContent(IndicesStatsResponse.java:164) org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:311) org.elasticsearch.rest.action.admin.indices.stats.RestIndicesStatsAction$RestSearchStatsHandler$1.onResponse(RestIndicesStatsAction.java:303) org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.finishHim(TransportBroadcastOperationAction.java:321) org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.onOperation(TransportBroadcastOperationAction.java:273) org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$2.run(TransportBroadcastOperationAction.java:225) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:724) I suspect this'll go away when we switch to node level. Thus, setting it to high because it will be easy to fix once we go to 1.0.
Change 117037 had a related patch set uploaded by Manybubbles: Update Elasticsearch monitoring for 1.0 https://gerrit.wikimedia.org/r/117037
Change 117037 merged by Ottomata: Update Elasticsearch monitoring for 1.0 https://gerrit.wikimedia.org/r/117037