We have our cluster running locally (for now) and everything seems to be configured correctly. Our prime calculation messages are distributed over our seednodes. However, we are intermittently losing messages. You can see the behaviour of two runs in the screenshot. Which messages are marked as dead letters isn't consistent at all.
Our messages are always sent the same way, they look like this. The last parameter means the nth prime to find.
new PrimeCalculationEntry(id, 1, 100000),
new PrimeCalculationEntry(id, 2, 150000),
new PrimeCalculationEntry(id, 3, 200000),
new PrimeCalculationEntry(id, 4, 250000),
new PrimeCalculationEntry(id, 5, 300000),
new PrimeCalculationEntry(id, 6, 350000),
new PrimeCalculationEntry(id, 7, 400000),
new PrimeCalculationEntry(id, 8, 450000)
Our cluster is set up like this: One non-seednode which is a group router and sends messages to two seednodes, which are configured as pool routers.
Non seednode: localhost:0 (random port)
akka {
actor {
provider = cluster
deployment {
/commander {
router = round-robin-group # routing strategy
routees.paths = ["/user/cluster"] # path of routee on each node
cluster {
enabled = on
allow-local-routees = on
}
}
}
}
remote {
dot-netty.tcp {
port = 0 #let os pick random port
hostname = localhost
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081", "akka.tcp://ClusterSystem@localhost:8082"]
}
}
Seednode 1: localhost:8081 (leader)
akka {
actor {
provider = cluster
deployment {
/cluster {
router = round-robin-pool
nr-of-instances = 10
}
}
}
remote {
dot-netty.tcp {
port = 8081
hostname = localhost
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
}
}
Seednode 2: localhost:8082
akka {
actor {
provider = cluster
deployment {
/cluster {
router = round-robin-pool
nr-of-instances = 10
}
}
}
remote {
dot-netty.tcp {
port = 8082
hostname = localhost
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
}
}
Can anyone point us in the right direction? Any issues with our configuration? Thank you in advance.
I think I know what the issue is here - you don't have any akka.cluster.role
s defined nor is your /commander
router configured with the use-role
setting - so as a result, every Nth message is being dropped because it's trying to route a message to itself and does not have a /user/cluster
actor present to receive it.
To fix this properly, we should do the following:
PrimeCalculationEntry
declare akka.cluster.roles=[prime]
/commander
router change its HOCON to: /commander {
router = round-robin-group # routing strategy
routees.paths = ["/user/cluster"] # path of routee on each node
cluster {
enabled = on
allow-local-routees = on
use-role = "prime"
}
}
This will eliminate the deadletters as the /commander
node will no longer be sending messages to itself every N iterations.