I have a DataFrame which has two columns of array values like below
var ds = Seq((Array("a","b"),Array("1","2")),(Array("p","q"),Array("3","4")))
var df = ds.toDF("col1", "col2")
+------+------+
| col1| col2|
+------+------+
|[a, b]|[1, 2]|
|[p, q]|[3, 4]|
+------+------+
I want to transform this into an array of pairs like below
+------+------+---------------+
| col1| col2| col3|
+------+------+---------------+
|[a, b]|[1, 2]|[[a, 1],[b, 2]]|
|[p, q]|[3, 4]|[[p, 3],[q, 4]]|
+------+------+---------------+
I guess I can use struct and then some udf. But I wanted to know if there is any built-in higher order method to do this efficiently.
For Spark-2.3
or below, I found the iterator zip method really handy for this use case (which I was unaware of while posting the question). I can define a small UDF
val zip = udf((xs: Seq[String], ys: Seq[String]) => xs.zip(ys))
and use as
var out = df.withColumn("col3", zip(df("col1"), df("col2")))
This gives me desired result.