scalaprofilingyourkit

Surprisingly slow of mutable.array.drop


I am a newbie in Scala, and when I am trying to profile my Scala code with YourKit, I have some surprising finding regarding the usage of array.drop.

Here is what I write:

...
val items = s.split(" +")   // s is a string
...
val s1 = items.drop(2).mkString(" ")
...

In a 1 mins run of my code, YourKit told me that function call items.drop(2) takes around 11% of the total execution time..

Lexer.scala:33 scala.collection.mutable.ArrayOps$ofRef.drop(int) 1054 11%

This is really surprising to me, is there any internal memory copy that slow down the processing? If so, what is the best practice to optimize my simple code snippet? Thank you.


Solution

  • This is really surprising to me, is there any internal memory copy that slow down the processing?

    ArrayOps.drop internally calls IterableLike.slice, which allocates a builder that produces a new Array for each call:

    override def slice(from: Int, until: Int): Repr = {
      val lo    = math.max(from, 0)
      val hi    = math.min(math.max(until, 0), length)
      val elems = math.max(hi - lo, 0)
      val b     = newBuilder
      b.sizeHint(elems)
    
      var i = lo
      while (i < hi) {
        b += self(i)
        i += 1
      }
      b.result()
    }
    

    You're seeing the cost of the iteration + allocation. You didn't specify how many times this happens and what's the size of the collection, but if it's large this could be time consuming.

    One way of optimizing this is to generate a List[String] instead which simply iterates the collection and drops it's head element. Note this will occur an additional traversal of the Array[T] to create the list, so make sure to benchmark this to see you actually gain anything:

    val items = s.split(" +").toList
    val afterDrop = items.drop(2).mkString(" ")
    

    Another possibility is to enrich Array[T] to include your own version of mkString which manually populates a StringBuilder:

    object RichOps {
      implicit class RichArray[T](val arr: Array[T]) extends AnyVal {
        def mkStringWithIndex(start: Int, end: Int, separator: String): String = {
          var idx = start
          val stringBuilder = new StringBuilder(end - start)
    
          while (idx < end) {
            stringBuilder.append(arr(idx))
            if (idx != end - 1) {
              stringBuilder.append(separator)
            }
            idx += 1
          }
    
          stringBuilder.toString()
        }
      }
    }
    

    And now we have:

    object Test {
      def main(args: Array[String]): Unit = {
        import RichOps._
        val items = "hello everyone and welcome".split(" ")
        println(items.mkStringWithIndex(2, items.length, " "))
      }
    

    Yields:

    and welcome