I create a Curator client as follows:
RetryPolicy retryPolicy = new RetryNTimes(3, 1000);
CuratorFramework client = CuratorFrameworkFactory.newClient(zkConnectString,
15000, // sessionTimeoutMs
15000, // connectionTimeoutMs
retryPolicy);
When running my client program I simulate a network partition by bringing down the NIC that Curator is using to communicate with Zookeeper. I have a few questions based on the behavior that I am seeing:
ConnectionStateManager - State change: SUSPENDED
message after 10 seconds. Is the amount of time until Curator enters the SUSPENDED state configurable, based on a percentage of the other timeout values, or always 10 seconds?ZooKeeper - Session: 0x14adf3f01ef0001 closed
message in the log, however this does not appear to trickle up as an event that I can capture or listen on. Am I missing something here?ConnectionStateManager - State change: LOST
message almost two minutes after the connection loss. Why so long?SUSPENDED
message is received, since it is entirely possible that Zookeeper has released the lock
unbeknownst to it on the other side of the network partition. Is this a typical/sane approach?Correct. Assume leadership has been lost on SUSPEND and LOST. This is the way the Apache Curator recipes work. You may want to use Apache Curator rather than implementing your own algorithm. https://curator.apache.org/curator-recipes/index.html