i have a dataframe as follows:
f1 |f2
=========
test | [{"f3": 1, "f4": "f4_1" }, {"f3": 2, "f4": "f4_2" }]
f2
is a list of objects
i want to get a data frame like below:
f3|f4 | temp_col
=========================
1 |"f4_1"| {"f1": "test"}
2 |"f4_2"| {"f1": "test"}
temp_col
is a name i provide.
how do i do that with pyspark?
i have tried using json_normalize
by converting to pandas df but it didn't work.
if you already loaded your json into a spark df , here is one way to do it:
result_df = df.withColumn("f2", explode(df.f2)).select(
"f2.f3",
"f2.f4",
struct(col("f1")).alias("temp_col"),
)
output:
f3 f4 temp_col
1 f4_1 {"f1":"test"}
2 f4_2 {"f1":"test"}