scalaakkaactorakka-supervisionakka-actor

Retry on minor Exceptions for a long-living akka actor


I have an actor that is created at application startup as a child of another actor and receives a message once per day from the parent to perform operation to fetch some files from some SFTP server.

Now, there might be some minor temporary connection exceptions that cause the operation to fail. In this case, a retry is needed.

But there might be a case in which exception is thrown and is not going to be resolved on a retry (ex: file not found, some configuration is improper etc.)

So, in this case what could be an appropriate retry mechanism and supervision strategy considering that the actor will receive messages after a long interval (once a day).

In this case, the message sent to the actor is not bad input - it is just a trigger. Example:

case object FileFetch

If I have a supervision strategy in the parent like this, it is going to restart the failing child on every minor/major exception without retries.

override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = -1, withinTimeRange = Duration.inf) {
    case _: Exception                => Restart
}

What I want to have is something like this:

override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = -1, withinTimeRange = Duration.inf) {
    case _: MinorException           => Retry same message 2, 3 times and then Restart
    case _: Exception                => Restart
}

Solution

  • "Retrying" or resending a message in the event of an exception is something that you have to implement yourself. From the documentation:

    If an exception is thrown while a message is being processed (i.e. taken out of its mailbox and handed over to the current behavior), then this message will be lost. It is important to understand that it is not put back on the mailbox. So if you want to retry processing of a message, you need to deal with it yourself by catching the exception and retry[ing] your flow. Make sure that you put a bound on the number of retries since you don’t want a system to livelock (so consuming a lot of cpu cycles without making progress).

    If you want to resend the FileFetch message to the child in the event of a MinorException without restarting the child, then you could catch the exception in the child to avoid triggering the supervision strategy. In the try-catch block, you could send a message to the parent and have the parent track the number of retries (and perhaps include a timestamp in this message, if you want the parent to enact some kind of backoff policy, for example). In the child:

    def receive = {
      case FileFetch =>
        try {
          ...
        } catch {
          case m: MinorException =>
            val now = System.nanoTime
            context.parent ! MinorIncident(self, now)
        }
      case ...
    } 
    

    In the parent:

    override val supervisorStrategy =
      OneForOneStrategy(maxNrOfRetries = -1, withinTimeRange = Duration.Inf) {
        case _: Exception => Restart
      }
    
    var numFetchRetries = 0
    
    def receive = {
      case MinorIncident(fetcherRef, time) =>
        log.error(s"${fetcherRef} threw a MinorException at ${time}")
        if (numFetchRetries < 3) { // possibly use the time in the retry logic; e.g., a backoff
          numFetchRetries = numFetchRetries + 1
          fetcherRef ! FileFetch
        } else {
          numFetchRetries = 0
          context.stop(fetcherRef)
          ... // recreate the child
        }
      case SomeMsgFromChildThatFetchSucceeded =>
        numFetchRetries = 0
      case ...
    }
    

    Alternatively, instead of catching the exception in the child, you could set the supervisor strategy to Resume the child in the event of a MinorException, while still having the parent handle the message retry logic:

    override val supervisorStrategy =
      OneForOneStrategy(maxNrOfRetries = -1, withinTimeRange = Duration.Inf) {
        case m: MinorException =>
          val child = sender()
          val now = System.nanoTime
          self ! MinorIncident(child, now)
          Resume
        case _: Exception => Restart
      }