[SOLVED] Explode JSON array into rows

Explode JSON array into rows

I have a dataframe which has 2 columns" "ID" and "input_array" (values are JSON arrays).

ID   input_array
1    [ {“A”:300, “B”:400}, { “A”:500,”B”: 600} ]
2    [ {“A”: 800, “B”: 900} ]

Output that I need:

ID A      B
1  300    400
1  500    600
2  800    900

I tried from_json, explode functions. But data type mismatch error is coming for array columns.

Solution

I have 2 interpretations of what input (column "input_array") data types you have.

If the type of input column is a string...

df = spark.createDataFrame(
    [(1, '[ {"A":300, "B":400}, { "A":500,"B": 600} ]'),
     (2, '[ {"A": 800, "B": 900} ]')],
    ['ID', 'input_array'])
df.printSchema()
# root
#  |-- ID: long (nullable = true)
#  |-- input_array: string (nullable = true)

...you can use from_json to extract Spark structure from JSON string and then inline to explode the resulting array of structs into columns.

df = df.selectExpr(
    "ID",
    "inline(from_json(input_array, 'array<struct<A:long,B:long>>'))"
)
df.show()
# +---+---+---+
# | ID|  A|  B|
# +---+---+---+
# |  1|300|400|
# |  1|500|600|
# |  2|800|900|
# +---+---+---+

If the type of input column is an array of strings...

df = spark.createDataFrame(
    [(1, [ '{"A":300, "B":400}', '{ "A":500,"B": 600}' ]),
     (2, [ '{"A": 800, "B": 900}' ])],
    ['ID', 'input_array'])
df.printSchema()
# root
#  |-- ID: long (nullable = true)
#  |-- input_array: array (nullable = true)
#  |    |-- element: string (containsNull = true)

...you can first use explode to move every array's element into rows thus resulting in a column of string type, then use from_json to create Spark data types from the strings and finally expand * the structs into columns.

from pyspark.sql import functions as F

df = df.withColumn('input_array', F.explode('input_array'))
df = df.withColumn('input_array', F.from_json('input_array', 'struct<A:long,B:long>'))
df = df.select('ID', 'input_array.*')

df.show()
# +---+---+---+
# | ID|  A|  B|
# +---+---+---+
# |  1|300|400|
# |  1|500|600|
# |  2|800|900|
# +---+---+---+