In a 3 node Actor cluster, While doing rolling restart of 3 Nodes Remembered entities are not recreating properly.
The Shards are completely rebalanced but Some of the entities not recreated.
akka.cluster.sharding.remember-entities = on
akka.cluster.sharding.remember-entities-store = ddata
akka.cluster.sharding.distributed-data.durable.keys = []
akka.remote.artery{
enabled = on
transport = tcp
}
At start all the 3 nodes will have 100 shards in each node with 1000 Actors totally 300 Shards And 3000 Actors.
Node 1 -- 100 Shards \ 1000 Actors
Node 2 -- 100 Shards \ 1000 Actors
Node 1 -- 100 Shards \ 1000 Actors
1.When Node 1 Down Shards on node 1 rebalanced to Node 2 And node 3 with all the remembered entities recreated on those nodes.
Node 1 -- Down
Node 2 -- 150 Shards \ 1500 Actors
Node 1 -- 150 Shards \ 1500 Actors
2.When Node 1 is Up after few moments Node 2 getting Down .Shards and the Remembered entities on Node 2 is recreated to Node 1.
Node 1 -- 150 Shards \ 1500 Actors
Node 2 -- Down
Node 1 -- 150 Shards \ 1500 Actors
3.When Node 2 is Up after few moments Node 3 getting down.Shards and the Remembered entities on Node 2 is recreated to Node 2 but some of the entities not recreated to Node 2 from Node 3. All the Shards are rebalanced anyway.
Node 1 -- 150 Shards \ 1500 Actors
Node 2 -- 150 Shards \ 1423 Actor
Node 1 -- Down
When we restart the Node 3 after the Node 2 joined the Cluster the recreation of Remembered entities is inconsistent.
In mean time there are messages will be send to the Actors on the Cluster.
1.If we are not restarting the Node 3 there is no issue with the Entities.
2.If we restart the Node 3 alone in rolling restart after some time there is no problem.
3.Increased\decreased Shard count.
4.Changed akka.cluster.distributed-data.majority-min-cap
from default 5 to 3 still issue persists.
While Debugging Further We found that the issue is with respect to replicating the Remember entities across the nodes in between the frequent restarts.
To retrieve the above key-value pairs, we can send a Get message to the Replicator Actor. This message will allow us to fetch the desired key values stored in the Actor Cluster.
Key<ORSet<String>> rememberEntitiesKey = ORSetKey.create(orSetKey);
Get<ORSet<String>> getCmd = new Get<ORSet<String>>(rememberEntitiesKey,Replicator.readLocal());
Future<Object> ack = Patterns.ask(replicator, getCmd, timeout1).toCompletableFuture();
Object result =ack.get(5000, TimeUnit.MILLISECONDS);
Replicator.GetSuccess<ORSet<String>> Orset = (GetSuccess<ORSet<String>>) result;
The value Contains entities in the Local ddata ---> Orset.dataValue().getElements();
To find out the current state of Remember entities on each node, we can check the local data of that node. By looking at the local data, we can see how the Remember entities are currently stored
Due to frequent restarts, the Replicator may fail to fully replicate the data, resulting in incomplete or inconsistent data in the cluster By tuning the below properties we have resolved this issue
Configured to this value
akka.cluster.sharding.distributed-data {
gossip-interval = 500 ms // default 2 s
notify-subscribers-interval = 100 ms // default 500 ms
}
After implementing these changes, even during frequent rolling restarts, the Remembered Entities data is now fully replicated to all nodes and all the entities are recreated successfully.