I'm Using Actor Cluster Sharding for a usecase where i have Set of Cluster Nodes and Set of Proxy Nodes.
From proxy node i will forward some messages to EntityActor Created in Actor Cluster Node.
For this proxy node will be joined into the Actor Cluster by ShardRegion Proxy.
I have to send message from Proxy to Entity Actors without data loss or duplication(AT MOST ONCE) delivery.
I'm using Futures to get Acknowledgement from Entity Actors when i Sent Data.That future Object will wait atmost for 5 Sec(configured) for acknowledgment.In idle case data will send in atmost once.
When node join/leave/restart the Shards in Shard Region will be re-balanced.At that time Shards will not be responsible.
If we send a Data from Shard Region Proxy it will be held in the buffer.
At that time if my Proxy Server restarted then the data will be lost.(DataLoss).
If i didn't get reply after the 5 Sec and i Retried it will be a duplication if the Buffered message reached after re-balance.
For this i check for every message isEntityActor Available or not by invoking ActorSelection with the Exact EntityActor path.
If the Shard is rebalancing then i will insert it in another Storage layer an when the Actor rebalanced in preStart i will fetch those data and process without data loss.
But this ActorSelection is taking more time in high data rate.
how can i improve this without ActorSelection to avoid dataloss and dataduplication.
"without data loss or duplication" is "exactly-once" delivery (i.e. both at-least-once and at-most-once).
In general in a distributed system exactly-once delivery is not possible.
The closest one can get is "effectively once", which basically means combining resending until the sender is sure that the message has been delivered with implementing the receiver in such a way as to be able to recognize when the message being received is one that it has already processed in which case it sends back the acknowledgement (in the hope that the sender eventually stops resending the message) but otherwise ignores the message. Techniques like including correlation IDs and sequence numbers in the messages being sent are often used to assist the recipient in determining that a message is being processed a second time.
Note that exactly-once delivery is generally not useful: what is really wanted is exactly-once processing, and that can only be defined by the application. Akka does provide support for reliable delivery, with the caveat that it does impose a substantial throughput penalty (both the sender and receiver have to be durable and every message sent will entail four writes to a datastore): the supporting infrastructure will guarantee you hard at-least-once (assuming a durable queue), and the persistent entity actor receiving the messages can use the seqNr
in the received messages to recognize a duplicate.