[SOLVED] Akka Cluster: Down all when unstable & SBR's role

Akka Cluster: Down all when unstable & SBR's role

I have a lot of worker nodes in my akka-cluster, which cause Down all when unstable decision due to their instability; But they don't have SBR's role.

Why Down all when unstable decision in not taken based on SBR's role?
To solve this problem, should i have distinct clusters or use Multi-DC cluster?

Solution

The primary constraint a split-brain resolver has to meet is that every node in the cluster reaches the same decision about which nodes need to be downed (including downing themselves). In the presence of different decisions being made, the guarantees of Cluster Sharding and Cluster Singleton no longer apply: there may be two incarnations of the same sharded entity or the singleton might not be a singleton.

Because there's latency inherent to disseminating reachability observations around the cluster, the less time has elapsed since seeing a change in reachability observations, the more likely it is that there's a node in the cluster which would disagree with our node about which nodes are reachable. That disagreement opens the door that node to make a different SBR decision than the one our node would make. The only strategy the SBR has which guarantees that every node makes the same decision even if there's a disagreement about membership or reachability is down-all.

Accordingly, SBR delays making a decision until there's been a long enough time since a cluster membership or reachability change has happened. In a particularly unstable cluster, if too much time has passed without achieving stability, the SBR will then apply the down-all strategy, which does not take cluster roles into account.

If you're not using cluster sharding or cluster singleton (and haven't implemented something with similar constraints...), you might be able to get away with disabling this fallback to down-all (if every bit of distributed state in your system forms a CRDT, for instance, you might be able to get away with this; if you know what a CRDT is, you know and if you don't, that almost certainly means not all distributed state in your system is a CRDT) with the configuration setting

akka.cluster.split-brain-resolver.down-all-when-unstable = off

Think very carefully about this in the context of your application. I would suspect that at least 99.9% of Akka clusters out there would violate correctness guarantees with this setting.

From your question about distinct clusters or Multi-DC, I take it you are spreading your cluster across multiple datacenters. In that case, note that inter-datacenter networking is typically less reliable than intra-datacenter networking. So that means that you basically have three options:

have separate clusters for each datacenter and use "something(s) else" to coordinate between them
use Multi-DC cluster which takes some account of the difference between inter- and intra-datacenter networking (e.g. that while it's possible for node A in some datacenter and node B in that datacenter to disagree on the reachability of a node C in that datacenter, it's highly likely that node A and node B will agree that node D in a different datacenter is reachable or not)
configuring the failure detector for the reliability of the inter-datacenter link (this is effectively treating even nodes in the same rack (or even running on the same physical host or even VM...) as if they were in separate datacenters). This will mean being very slow to declare that a node has crashed (and giving that node a lot of time to say "no, I'm not dead, I was just being quiet/sleepy/etc."). For some applications, this might be a viable strategy.

Which of those 3 is the right option? I think completely separate clusters communicating and coordinating over some separate channel(s) and modeling this in the domain is often useful (for instance, you might be able to balance traffic to the datacenters in such a way that it's highly unlikely you'd need your west coast datacenter to know what's happening on the east coast). Multi-DC might allow for a more consistency than separate clusters. It's probably unlikely that your application requirements are such that multiple DCs within a vanilla single cluster will work well.