azure-service-fabricremotingservice-fabric-statelessservice-fabric-actor

Service Fabric remote calls to stateless service not returning, stuck


In our application, we have a stateful actor which calls another stateless service. The stateless service does some processing and returns the response back to the actor. The service can sometimes take 1-2 hours to complete the processing.

Intermittently, we are seeing scenarios where the service has successfully completed processing but response is not returned to the actor. I mean the control is not coming back to the actor and the overall actor processing gets stuck and does not proceed further. We could not find any exceptions on the service side or on the actor end.

Looking for pointers to further investigate the issue. Any help would be much appreciated.


Solution

  • I recommend changing your architecture to an event-driven model. For example, by using this pub/sub library. This way, the service can respond to an event from the actor and start processing. When it's done, it fires another event, which can be received and processed by the actor.

    This way, the actor can live only during event sending/receiving, allowing your cluster to host more workloads. It would also prevent your actor having to wait for hours for a call to return, which prevents other processes from being able use it.

    To work around your current issue I'd recommend these steps: