Explode a null column in pyspark which can be of type struct of struct

I have a dataframe and I am trying to do following transformation.

exploded_prod_df = exploded_hits_product_df.withColumn("product_cd", F.explode(F.col("products.customdimensions")))

My issue is products is Null currently, which may not be the case in future. I tried many things but they didn't work. For example, I tried below code

exploded_prod_df = (
exploded_hits_product_df
.withColumn('products_customdimensions', F.when(F.col('products').isNull(), F.lit(None).cast(T.StructType()))
.otherwise('products.customdimensions'))
.withColumn("product_cd", F.explode_outer(F.col("products_customdimensions")))

)

but this is giving error. Please help me. This is Google Analytics data. Error is: Cannot resolve "CASE WHEN (products IS NULL) THEN NULL ELSE products.customdimensions END" due to data type mismatch: Input to casewhen should all be the same type, but it's ["STRUCT<>", "STRING"].;

Solution

If you could provide the output of print(product_df.dtypes) I would be happy to provide more context in this answer.

These are my thoughts so far:

explode and explode_outer only work on Map and Array types.

If products is NULL and products.customdimensions was one of these types, it is unlikely you would receive errors when using these explode functions.

Since you are getting errors, I think the issue is that products.customdimensions is of type Struct

Option 1. You know the fields in products.customdimensions.

If so, you can access each field you need using .withColumn.

df = product_df.withColumn("product_type", F.col("products.customdimensions.type"))\
        .withColumn("product_color", F.col("products.customdimensions.color"))

df.select('products','products.customdimensions', 'product_type', 'product_color').show()

products	customdimensions	product_type	product_color
{{red, apple}}	{red, apple}	apple	red
null	null	null	null

If you instead need this information to be in a vertical format, you could create an Array with the fields you need and then explode it.

df = product_df.withColumn('products_customdimensions',
                    F.when(F.col('products').isNull(), F.array(F.lit(None)))\
                    .otherwise(F.array("products.customdimensions.type",
                                       "products.customdimensions.color"))\
                   )

df = df.withColumn("product_cd", F.explode('products_customdimensions'))

df.select('products', 'products.customdimensions', 'product_cd').show()

products	customdimensions	product_cd
{{red, apple}}	{red, apple}	apple
{{red, apple}}	{red, apple}	red
null	null	null

Option 2. You do not know the fields in products.customdimensions.

If this is the case, you may want to convert the Struct to a Map. Once it is a Map, there is an option to use it in the explode or explode_outer function

Here is an example of converting a Struct to a Map, Convert struct to map in Spark SQL