akka.netakka.net-cluster

Node in cluster stops listening for connections


We have two nodes with the same code, that are using akka.net in a cluster and send messages using remote between them.
Akka.Net version is 1.2.0 and we are using dot-netty for transport. This is the relevant configuration section:
actor { provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster" } remote { dot-netty.tcp { port = 34083 hostname = host_name } }

The two nodes run on different Windows servers (hosted on a Windows Service). Sometimes, a node stops listening to the assigned port (checked by using netstat -an) and all communication between them is lost until I restart the Windows Service.
This is all the information we get in the logs (the first 2 messages are from one host and the third one from the other):
60133 2017-08-11 10:09:11.993 Host1 Akka.Remote.Transport.ProtocolStateActor Error No response from remote. Handshake timed out or transport failure detector triggered. 60134 2017-08-11 10:09:12.040 Host1 Akka.Remote.ReliableDeliverySupervisor Warn Association with remote system akka.tcp://ProcesamientoActorSystem@warpacb004.nead.danet:34083 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow) at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction1 partialAction) at Akka.Actor.UntypedActor.Receive(Object message) at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message) at Akka.Actor.ActorCell.ReceiveMessage(Object message) at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope) at Akka.Actor.ActorCell.Invoke(Envelope envelope) --- End of stack trace from previous location where exception was thrown --- at Akka.Actor.ActorCell.HandleFailed(Failed f) at Akka.Actor.ActorCell.SysMsgInvokeAll(EarliestFirstSystemMessageList messages, Int32 currentState)]
60135 2017-08-11 10:09:14.313 Host2 Akka.Remote.ReliableDeliverySupervisor Warn Association with remote system akka.tcp://ProcesamientoActorSystem@warpacb005.nead.danet:34083 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level, Boolean needToThrow) at Akka.Actor.ReceiveActor.ExecutePartialMessageHandler(Object message, PartialAction
1 partialAction) at Akka.Actor.ActorCell.<>c__DisplayClass112_0.b__0(Object m) at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message) at Akka.Actor.ActorCell.ReceiveMessage(Object message) at Akka.Actor.ActorCell.AutoReceiveMessage(Envelope envelope) at Akka.Actor.ActorCell.Invoke(Envelope envelope)]
I guess something is failing at the transport layer, and dot-netty close the socket and stop listening.
Is there any way to stop this from happening or at least make it less frecuent? If not, can we hook to the failure event to start listening again?


Solution

  • We upgraded Akka to 1.2.3 and it started working correctly. We see the same errors in the log from time to time, but the connection is not dropped.