akkaakka-typedconsistent-hashing

Akka Consistent-hashing routing


I have developed an application using Typed Akka 2.6.19.

I want to route events from a certain source to the SAME routee/worker based on IP address. So, I have planned to use Consistent-hashing routing.

I do not see much literature on this route type for Typed akka. Please give some pointers & example code.


Solution

  • You need only to initialize the router with the hash function to use.

    For example (in Scala, though the Java API will be similar):

    trait Command {
      // all commands are required to have an associated IP Address (here represented in string form)
      def ipAddr: String
    }
    
    // inside, e.g. the guardian actor, and using the actor context to spawn the router as a child
    val serviceKey = ServiceKey[Command]("router")
    val router = context.spawn(
      Routers.group(serviceKey)
        .withConsistentHashingRouting(
          virtualNodesFactor = 10,
          mapping = { msg: Command => msg.ipAddr }
        )
    
    // spawn the workers, who will register themselves with the router
    val workerBehavior =
      Behaviors.setup[Command] { ctx =>
        ctx.system.receptionist ! Receptionist.Register(serviceKey, context.self)
    
        Behaviors.receiveMessage { msg =>
          ??? // TODO
        }
      }
    
    (1 to 10).foreach { i =>
      context.spawn(workerBehavior, s"worker-$i")
    }
    

    Under the hood, for every worker that registers, the router will then generate 10 (the virtualNodesFactor) random numbers and associate them with that worker. The router will then execute the mapping function for every incoming message to get a string key for the message, which it will hash. If there is a worker with an associated random number less than or equal to that hash, the worker the greatest associated random number which is also less than or equal to that hash is selected; if the hash happens to be less than every random number associated with any worker, the worker with the greatest associated random number is selected.

    Note that this implies that a given worker may process messages for more than 1 ipAddr.

    Note that this algorithm does not make a strong guarantee that commands with the same ipAddr will always go to the same worker, even if the worker they were routed to is still active: if another worker registers and has a token generated which is greater than the previous worker's relevant token and that generated token is less than the hash of ipAddr, that new worker will effectively steal the messages for that ipAddr from the old worker.

    The absence of this guarantee in turn means that if you depend for correctness on all messages for a given ipAddr to go to the same worker, you'll want something like cluster sharding, which is higher overhead but allows something a guarantee that no worker will ever see messages for multiple ipAddrs and (especially with persistence) will guarantee that the same "logical actor"/entity handles messages for the same ipAddr.