scalaanorm

What's the advantage of the streaming support in Anorm (Play Scala)?


I have been reading the section Streaming results in Play docs. What I expected to find is a way to create a Scala Stream based on the results, so if I create a run that returns 10,000 rows that need to be parsed, it would parse them in batches (e.g. 100 at a time) or would just parse the first one and parse the rest as they're needed (so, a Stream).

What I found (from my understanding, I might be completely wrong) is basically a way to parse the results one by one, but at the end it creates a list with all the parsed results (with an arbitrary limit if you like, in this case 100 books). Let's take this example from the docs:

val books: Either[List[Throwable], List[String]] = 
  SQL("Select name from Books").foldWhile(List[String]()) { (list, row) => 
    if (list.size == 100) (list -> false) // stop with `list`
    else (list := row[String]("name")) -> true // continue with one more name
  }

What advantages does that provide over a basic implementation such as:

val books: List[String] = SQL("Select name from Books").as(str("name"))  // please ignore possible syntax errors, hopefully understandable still

Solution

  • Parsing a very large number of rows is just inefficient. You might not see it as easily for a simple class, but when you start adding a few joins and have a more complex parser, you will start to see a huge performance hit when the row count gets into the thousands.

    From my personal experience, queries that return 5,000 - 10,000 rows (and more) that the parser tries to handle all at once consume so much CPU time that the program effectively hangs indefinitely.

    Streaming avoids the problem of trying to parse everything all at once, or even waiting for all the results to make it back to the server over the wire.