scalascala-collectionsimplicitscala-implicits

Understanding the Scala Map object when mapping to the head of a list


Hi I have the following data and want to map it to the first item in the second parameter. So for:

1 -> List((1,11))
1 -> List((1,1), (1,111))

I want:

(1,11)
(1,1)

When this data is in an RDD I can do the following:

scala> val m = sc.parallelize(Seq(11 -> List((1,11)), 1 -> List((1,1),(1,111))))
m: org.apache.spark.rdd.RDD[(Int, List[(Int, Int)])] = ParallelCollectionRDD[198] at parallelize at <console>:47

scala> m.map(_._2.head).collect.foreach(println)
(1,11)
(1,1)

However, when it is in a Map object (result of a groupBy) I get the following:

scala> val m = Map(11 -> List((1,11)), 1 -> List((1,1)))
m: scala.collection.immutable.Map[Int,List[(Int, Int)]] = Map(11 -> List((1,11)), 1 -> List((1,1), (1,111)))

scala> m.map(_._2.head)
res1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1)

When I map to the whole list I get what I would expect, but not when I call head on it

scala> m.map(_._2)
res2: scala.collection.immutable.Iterable[List[(Int, Int)]] = List(List((1,11)), List((1,1), (1,111)))

I can also get the result I want if I do either of the following:

scala> m.map(_._2).map(_.head)
res4: scala.collection.immutable.Iterable[(Int, Int)] = List((1,11), (1,1))

scala> m.values.map(_.head)
res5: Iterable[(Int, Int)] = List((1,11), (1,1))

Could someone explain please what is going on here?


Solution

  • The map operation on a scala.collection.immutable.Map behave differently depending on the return type of the map operation.

    When the return type if of Type Tuple2[T,P]:

    the output of the Map operation results in an another Map with the first element of the tuple _1 as the key and the second element _2 as the value.

    for example

    scala> m.map(_ => 10 -> 1)
    res14: scala.collection.immutable.Map[Int,Int] = Map(10 -> 1) // note the return type is Map.
    

    When the return type is anything other than Tuple2:

    when the return type is anything other than Tuple2 then output of the map operation is a list.

    scala> m.map(_ => 10 )
    res15: scala.collection.immutable.Iterable[Int] = List(10, 10) // note that the return type now is a List.
    

    so with the above established fact, for a Map of value Map(11 -> List((1,11)), 1 -> List((1,1))) the operation m.map(_._2.head) produces Tuple2 values (1, 11) and (1,1). since the first value (_1) of each Tuple2 item is 1 (i.e. the key of each value is 1), the (1,1) overwrites (1,11) and we end up with a single value of (1,1).

    In other cases the map operation doesnt return types of Tuple2 and hence it results in List type instead of Maptypes hence the difference is results.