I have three RDDs of the same size rdd1
contains a String identifier, rdd2
contains a vector and rdd3
contains an integer value.
Essentially I want to zip those three together to get an RDD of RDD[String,Vector,Int]
but I continuously get can't zip RDDs with unequal number of partitions. How can I completely bypass zip to do the abovementioned thing?
Try:
rdd1.zipWithIndex.map(_.swap).join(rdd2.zipWithIndex.map(_.swap)).values