scalaapache-sparkrdd

Can't Zip RDDs with unequal number of partitions. What can I use as an alternative to zip?


I have three RDDs of the same size rdd1contains a String identifier, rdd2 contains a vector and rdd3contains an integer value.

Essentially I want to zip those three together to get an RDD of RDD[String,Vector,Int] but I continuously get can't zip RDDs with unequal number of partitions. How can I completely bypass zip to do the abovementioned thing?


Solution

  • Try:

    rdd1.zipWithIndex.map(_.swap).join(rdd2.zipWithIndex.map(_.swap)).values