Handleing AskTimeoutException on Akka based application

I have the following HTTP-based application that routes every request to an Akka Actor which uses a long chain of Akka Actors to process the request.

   path("process-request") {
      post {
        val startedAtAsNano = System.nanoTime()
        NonFunctionalMetrics.requestsCounter.inc()
        NonFunctionalMetrics.requestsGauge.inc()
        entity(as[Request]) { request =>
          onComplete(distributor ? [Response](replyTo => Request(request, replyTo))) {
            case Success(response) =>
              NonFunctionalMetrics.requestsGauge.dec()
              NonFunctionalMetrics.responseHistogram.labels(HttpResponseStatus.OK.getCode.toString).observeAsMicroseconds(startedAtAsNano, System.nanoTime())
              complete(response)
            case Failure(ex) =>
              NonFunctionalMetrics.requestsGauge.dec()
              NonFunctionalMetrics.responseHistogram.labels(HttpResponseStatus.INTERNAL_SERVER_ERROR.getCode.toString).observeAsMicroseconds(startedAtAsNano, System.nanoTime())
              logger.warn(s"A general error occurred for request: $request, ex: ${ex.getMessage}")
              complete(InternalServerError, s"A general error occurred: ${ex.getMessage}")
          }
        }
      }
    }

As you can see, I'm sending the distributor an ask request for response.

The problem is that on really high RPS, sometimes, the distributor fails with the following exception:

2022-04-16 00:36:26.498 WARN c.d.p.b.http.AkkaHttpServer - A general error occurred for request: Request(None,0,None,Some(EntitiesDataRequest(10606082,0,-1,818052,false))) with ex: Ask timed out on [Actor[akka://MyApp/user/response-aggregator-pool#1374579366]] after [5000 ms]. Message of type [com.dv.phoenix.common.pool.WorkerPool$Request]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.

This is a typical non-informative Exception, the normal processing time is about 700 micros, 5 seconds its must be stuck somewhere at the pipeline since it cannot be that high.

I want to monitor this, I thought about adding Kamon integration which provides Akka Actors module with mailboxes, etc.

I tried to add the following configurations but its not worked for me: https://kamon.io/docs/latest/instrumentation/akka/ask-pattern-timeout-warning/ (didn't show any effect)

Is there other suggestions to understand the cause for this issue on high RPS system?

Thanks!

Solution

The Kamon instrumentation is useful for finding how you got to the ask. It can be useful if you have a lot of places where an ask can time out, but otherwise it's not likely to tell you the problem.

This is because an ask timeout is nearly always a symptom of some other problem (the lone exception is if many asks could plausibly be done in a stream (e.g. in a mapAsync or ask stage) but aren't; that doesn't apply in this code). Assuming that the timeouts aren't caused by (e.g.) a database being down so you're getting no reply or a cluster failing (both of these are fairly obvious, thus my assumption), the cause of a timeout (any timeout, generally) is often having too many elements in a queue ("saturation").

But which queue? We'll start with the distributor, which is an actor processing messages one-at-a-time from its mailbox (which is a queue). When you say that the normal processing time is 700 micros, is that measuring the time the distributor spends handling a request (i.e. the time before it can handle the next request)? If so, and the distributor is taking 700 micros, but requests come in every 600 micros, this can happen:

time 0: request 0 comes in, processing starts in distributor (mailbox depth 0)
600 micros: request 1 comes in, queued in distributor's mailbox (mailbox depth 1)
700 micros: request 0 completes (700 micros latency), processing of request 1 begins (mailbox depth 0)
1200 micros: request 2 comes in, queued (mailbox depth 1)
1400 micros: request 1 completes (800 micros latency), processing of request 2 begins (mailbox depth 0)
1800 micros: request 3 comes in, queued (mailbox depth 1)
2100 micros: request 2 completes (900 micros latency), processing of request 3 begins (mailbox depth 0)
2400 micros: request 4 comes in, queued (mailbox depth 1)
2800 micros: request 3 completes (1000 micros latency), processing of request 4 begins (mailbox depth 0)
3000 micros: request 5 comes in, queued (mailbox depth 1)
3500 micros: request 4 completes (1100 micros latency), processing of request 5 begins (mailbox depth 0)
3600 micros: request 6 comes in, queued (mailbox depth 1)
4200 micros: request 7 comes in, queued, request 5 completes (1200 micros latency), processing of request 6 begins (mailbox depth 1)
4800 micros: request 8 comes in, queued (mailbox depth 2)
4900 micros: request 6 completes (1300 micros latency), processing of request 7 begins (mailbox depth 1)
5400 micros: request 9 comes in, queued (mailbox depth 2)

and so on: the latency and depth increase without bound. Eventually, the depth is such that requests spend 5 seconds (or more, even) in the mailbox.

Kamon has the ability to track the number of messages in the mailbox of an actor (it's recommended to only do this on specific actors). Tracking the mailbox depth of distributor in this case would show it growing without bound to confirm that this is happening.

If the distributor's mailbox is the queue that's getting too deep, first consider how request N can affect request N + 1. The one-at-a-time processing model of an actor is only strictly required when the response to a request can be affected by the request immediately prior to it. If a request only concerns some portion of the overall state of the system then that request can be handled in parallel with requests that do not concern any part of that portion. If there are distinct portions of the overall state such that no request is ever concerned with 2 or more portions, then responsibility for each portion of state can be offloaded to a specific actor and the distributor looks at each request only for long enough to determine which actor to forward the request to (note that this will typically not entail the distributor making an ask: it hands off the request and its the responsibility of the actor it hands off to (or that actor's designee...) to reply). This is basically what Cluster Sharding does under the hood, and it's also noteworthy that doing this will probably increase the latency under low load (because you are doing more work), but increases peak throughput by up to the number of portions of state.

If that's not a workable way to address the distributor's mailbox being saturated (viz. there's no good way to partition the state), then you can at least limit the time requests spend in the mailbox by including a "respond-by" field in the request message (e.g. for a 5 second ask timeout, you might require a response by 4900 millis after constructing the ask). When the distributor starts processing a message and the respond-by time has passed, it moves onto the next request: doing this effectively means that when the mailbox starts to saturate, the message processing rate increases.

Of course, it's possible that your distributor's mailbox isn't the queue that's getting saturated, or that if it is, it's not because the actor is spending too much time processing messages. It's possible that the distributor (or other actors needed for a response) aren't processing messages.

Actors run inside a dispatcher which has the ability to have some number of actors (or Future callbacks or other tasks, each of which can be viewed as equivalent to an actor which is spawned for processing a single message) processing a message at a given time. If there are more actors which have a message in their respective mailboxes than the number that can be processing a message, those actors are in a queue to be scheduled (note that this applies even if you happen to have a dispatcher which will spawn as many threads as it needs to process a message: since there are a limited number of CPU cores, the OS kernel scheduler's queue will take the role of the dispatcher queue). Kamon can track the depth of this queue. In my experience, it's more valuable to detect thread starvation (basically whether the time between task submission and when the task starts executing exceeds some threshold) is occurring. Lightbend's package of commercial tooling for use with Akka (disclaimer: I am employed by Lightbend) provides tools for detecting, with minimal overhead, whether starvation is occurring and providing other diagnostic information.

If thread starvation is being observed, and things like garbage collection pauses, or CPU throttling (e.g. due to running in a container) are ruled out, the primary cause of starvation is actors (or actor-like things) taking too long to process a message either because they are executing blocking I/O or are doing too much in the processing of a single message. If blocking I/O is the culprit, try to move the I/O to actors or futures running in a thread pool with far more threads than the number of CPU cores (some even advocate for an unbounded thread pool for this purpose). If it's a case of doing too much computation in processing a single message, look for spots in the processing where it makes sense to capture the state needed for the remainder of the computation in a message and send that message to yourself (this is basically equivalent to a coroutine yielding).