I have an array JSON as below format
{
"marks": [
{
"subject": "Maths",
"mark": "80"
},
{
"subject": "Physics",
"mark": "70"
},
{
"subject": "Chemistry",
"mark": "60"
}
]
}
I need to split each array object as separate JSON files. Is there any way to do this in spark shell.
You can explode the marks array of structs, add an ID column, and write JSON files partitioned by the unique ID column.
df.selectExpr("inline(marks)")
.withColumn("id", monotonically_increasing_id)
.repartition(col("id"))
.write
.partitionBy("id")
.json("output")