I have an actor system with persistent actor A. On receiving message M, this should spawn an instance of persistent actor B, that executes some dangerous and long-running process (involving exchanging other messages with other parties), and sends back message N to A. Upon receiving N, A should terminate B.
Spawining is implemented this way: when A receives M, it validates, calculates and creates an event M', that is persisted. When the event is applied, A spawns the child with the precalculated information. If the system is restarted at this point, M' will be replayed to A, and it will create a new incarnation of the same child B.
What I am struggling with is handling the case of terminated children when recovering the system: I would like to not see any Bs that were terminated before restarting the system.
Initially I just sent poison pill messages from the parent, but since the persistent actors do not store any event about receiving of such a command, and just die kindly, when the system is recovering, this last chapter of their story is not replayed to them, and they just keep hanging around.
I took a different approach and tried to call Context.Stop(child)
when processing the recovery messages of A, but that lead to all these Bs being terminated before they could recover, causing the system to log issues, like recovery timing out.
So I guess I either have to let the B recover before being killed, or not recreate it in the first place.
What I am now trying to do is to introduce a flag in the object representing the state of B, creating a custom message to be sent instead of PoisonPill, so that B can persist an event of being terminated, and when it receives the information of the recovery being completed, checking this field and terminating itself. But it looks like hell of a lot of work for a simple requirement of not resurrecting dead actors upon restart, so I am wondering if I am doing something completely wrong or trying to reinvent hot water.
While the persistent callback called from Persist(domainEvent, callback)
will be called only once, the actual handler of the recovery procedure may (and probably will be) called multiple times for the same event during lifecycle of an actor. For this reason it's important to keep its behavior idempotent - this is not the case, when you're creating a new child on the recovery handler call.
For cases like yours, the best idea seems to be creating a note about pending child processes during recovery, however the creation of them should be postponed until the recovery procedure finishes - this can be executed by overriding the OnReplaySuccess
method of persistent actor. At this point you should be able to determine which of the children actors should remain alive and which of them have already finished their processing in the past, so it should be easy to only resurrect necessary ones.