javakubernetesakkaakka-httpakka-cluster

Akka heartbeat delays


Akka app on Kubernetes is facing delayed heartbeats, even when there is no load.

There is also constantly the following warning:

heartbeat interval is growing too large for address ...

I tried to add a custom dispatcher for the cluster, even for every specific actor but did not help. I am not doing any blocking operations, since it is just a simple Http server.

When the cluster has load, the nodes get Unreachable.

I created a repository which can be used to reproduce the issue : https://github.com/CostasChaitas/Akka-Demo


Solution

  • First, thanks for the well documented reproducer. I did find one minor glitch with a dependency you included, but it was easy to resolve.

    That said, I was unable to reproduce your errors. Everything worked fine on my local machine and on my dev cluster. You don't include your load generator, so maybe I just wasn't generating as sustained a load, but I got no heartbeat delays at all.

    I suspect this is a duplicate of Akka Cluster heartbeat delays on Kubernetes . If so, it sounds like you've already checked for my usual suspects of GC and CFS. And if you are able to reproduce locally it also make it improbable that it's my other common problem of badly configured K8 networking. (I had one client that was having problems with Akka clustering on K8 and it turns out that it was just a badly configured cluster: the network was dropping and delaying packets between pods.)

    Since you say this is load testing perhaps you are just running out of sockets/files? You don't have much in the way of HTTP server configuration. (Nor any JVM options.)

    I think my next debugging step would be to connect to one of the running contains and trying to test the network between the pods in the network.