I have a dataframe with two array columns:
+---------+-----------------------+
|itemval |fruit |
+---------+-----------------------+
|[1, 2, 3]|[apple, banana, orange]|
+---------+-----------------------+
I am trying to zip them and create a name value pair
+---------+-----------------------+--------------------------------------+
|itemval |fruit |ziped |
+---------+-----------------------+--------------------------------------+
|[1, 2, 3]|[apple, banana, orange]|[[1, apple], [2, banana], [3, orange]]|
+---------+-----------------------+--------------------------------------+
and then make it to JSON, the to_json output is formatted like this
+---------------------------------------------------------------------------+
|zipped |
+---------------------------------------------------------------------------+
|[{"_1":"1","_2":"apple"},{"_1":"2","_2":"banana"},{"_1":"3","_2":"orange"}]|
+---------------------------------------------------------------------------+
The format, I am expecting is like this
+---------------------------------------------------------------------------+
|zipped |
+---------------------------------------------------------------------------+
|[{"itemval":"1","name":"apple"},{"itemval":"2","name":"banana"},{"itemval":"3","name":"orange"}]|
+---------------------------------------------------------------------------+
Here is my implementation:
val df1 = Seq((Array(1,2,3),Array("apple","banana","orange"))).toDF("itemval","fruit")
df1.show(false)
def zipper=udf((list1:Seq[String],list2:Seq[String]) => {
val zipList = list2 zip list1
zipList
)
df1.withColumn("ziped",to_json(zipper($"fruit",$"itemval"))).drop("itemval","fruit").show(false)
This is the solution which worked for me. Create a schema with new value and cast it to the column
val schema = ArrayType(
StructType(
Array(
StructField("itemval",StringType),
StructField("name",StringType)
)
)
)
val casted =zival.withColumn("result",$"ziped".cast(schema))
casted.show(false)
casted.select(to_json($"result")).show(false)
and the out put will be
casted:org.apache.spark.sql.DataFrame
ziped:array
element:struct
_1:string
_2:string
result:array
element:struct
itemval:string
name:string
+-----------------------------------------------------------------+
|structstojson(result) |
+-----------------------------------------------------------------+
|[{"itemval":"3","name":"orange"},{"itemval":"2","name":"banana"}]|
+-----------------------------------------------------------------+