scalaapache-sparkrddsparkcorebigdata

How to merge Arrays in RDD


I'm newbie in Spark. I have the following RDD[Array[(String, String, String)]]

val r1 = sc.parallelize(Array(Array(("123","456","789"),("AAA","BBB","CCC")),Array(("DDD","EEE","FFF"),("E1","E2","E3"))))

I want to merge Arrays in it like

Array((123,456,789), (AAA,BBB,CCC), (DDD,EEE,FFF), (E1,E2,E3))

I can do this with r1.reduce(_ ++ _). However, I want to use Transformations functions like map, not Actions ones. Is it possible to do that? I'm using Spark 1.3.1.

Thank you


Solution

  • You can do:

    val res: RDD[(String, String, String)] = r1.flatMap(identity)