I'm constantly getting these error messages in my logs:
[2015-11-10 13:52:03,037][WARN ][discovery.zen.ping.unicast] [ClusterUK Node 1] [11] failed send ping to [ClusterUK Node 1][x-eBYFoiRemOBK7egMHTRg][elasticuk1][inet[/172.24.32.10:9300]]{master=true}
org.elasticsearch.ElasticsearchIllegalStateException: can't add nodes to a stopped transport
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:746)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:376)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2015-11-10 13:52:03,038][WARN ][discovery.zen.ping.unicast] [ClusterUK Node 1] [12] failed send ping to [ClusterUK Node 1][x-eBYFoiRemOBK7egMHTRg][elasticuk1][inet[/172.24.32.10:9300]]{master=true}
org.elasticsearch.ElasticsearchIllegalStateException: can't add nodes to a stopped transport
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:746)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:376)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2015-11-10 13:52:03,038][WARN ][discovery.zen.ping.unicast] [ClusterUK Node 1] [12] failed send ping to [ClusterUK Node 1][x-eBYFoiRemOBK7egMHTRg][elasticuk1][inet[/172.24.32.10:9300]]{master=true}
org.elasticsearch.ElasticsearchIllegalStateException: can't add nodes to a stopped transport
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:746)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:731)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:216)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:376)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2015-11-10 13:52:11,378][INFO ][transport ] [ClusterUK Node 1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.24.32.10:9300]}
[2015-11-10 13:52:11,394][INFO ][discovery ] [ClusterUK Node 1] ClusterUK/FTiLxRmZQLyFtyap8JTj2w
[2015-11-10 13:52:14,498][INFO ][cluster.service ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:52:14,749][INFO ][http ] [ClusterUK Node 1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.24.32.10:9200]}
[2015-11-10 13:52:14,750][INFO ][node ] [ClusterUK Node 1] started
[2015-11-10 13:52:44,994][INFO ][discovery.zen ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [do not exists on master, act as master failure]
[2015-11-10 13:52:44,996][WARN ][discovery.zen ] [ClusterUK Node 1] master left (reason = do not exists on master, act as master failure), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:52:44,996][INFO ][cluster.service ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:52:48,047][INFO ][cluster.service ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:53:10,689][INFO ][cluster.service ] [ClusterUK Node 1] removed {[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:53:13,199][INFO ][cluster.service ] [ClusterUK Node 1] added {[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:53:35,963][INFO ][discovery.zen ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [transport disconnected]
[2015-11-10 13:53:35,964][WARN ][discovery.zen ] [ClusterUK Node 1] master left (reason = transport disconnected), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:53:35,965][INFO ][cluster.service ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:53:39,018][INFO ][cluster.service ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:54:03,581][INFO ][discovery.zen ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [transport disconnected]
[2015-11-10 13:54:03,581][WARN ][discovery.zen ] [ClusterUK Node 1] master left (reason = transport disconnected), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:54:03,581][INFO ][cluster.service ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:54:06,603][INFO ][cluster.service ] [ClusterUK Node 1] detected_master [ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}, added {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-receive(from master [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}])
[2015-11-10 13:54:39,790][INFO ][discovery.zen ] [ClusterUK Node 1] master_left [[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true}], reason [transport disconnected]
[2015-11-10 13:54:39,792][WARN ][discovery.zen ] [ClusterUK Node 1] master left (reason = transport disconnected), current nodes: {[ClusterUK Client Node STG1][_JfbrXjFTzGD7BL7OTqbVA][Staging1][inet[/192.168.100.248:9300]]{data=false, master=false},[ClusterUK Node 1][FTiLxRmZQLyFtyap8JTj2w][elasticuk1][inet[elasticuk1/172.24.32.10:9300]]{master=true},[ClusterUK Node 3][rHJ486YyQHqKytG44fmC7g][elasticuk3][inet[/172.24.32.8:9300]]{master=true},}
[2015-11-10 13:54:39,792][INFO ][cluster.service ] [ClusterUK Node 1] removed {[ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true},}, reason: zen-disco-master_failed ([ClusterUK Node 2][T5R_1SUwRu6Q4zZLMTbNlA][elasticuk2][inet[/172.24.32.5:9300]]{master=true})
[2015-11-10 13:54:42,366][ERROR][marvel.agent.exporter ] [ClusterUK Node 1] remote target didn't respond with 200 OK response code [503 Service Unavailable]. content: [:)
��error�ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]��status$��]
That'd be my elasticsearch.yml
file:
action.disable_delete_all_indices: true
cluster.name: ClusterUK
network.publish_host: "172.24.32.10"
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["172.24.32.10", "172.24.32.5", "172.24.32.8"]
indices.fielddata.cache.size: 25%
indices.cluster.send_refresh_mapping: false
node.name: "ClusterUK Node 1"
node.master: true
node.data: true
bootstrap.mlockall: true
In some cases it leave Elasticsearch not running as a service (few seconds).
This is currently running in Rackspace and I think there might be network issues involved (However, I'm binding to a specific IP address and use unicast).
There are 4 nodes running there (3 with master=true and data=true and one client node).
Can someone give me an insight on what's actually happening there? Version 1.7.3 (client node 1.7.1) on Windows Server.
I'm suspecting that issue comes from master left (reason = transport disconnected)
and it's a split-brain, but how do I fix it?
I was able to find what was the issue. Elasticsearch doesn't tolerate TCP Offloading.
TCP offload engine is a function used in network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. By moving some or all of the processing to dedicated hardware, a TCP offload engine frees the system's main CPU for other tasks. However, TCP offloading has been known to cause some issues, and disabling it can help avoid these issues.
This solved my issue.