jsondataframeapache-sparkpysparknested-json

Building nested Json from dataframe using pyspark


I have a dataframe and build a nested json object from this dataframe to represent the hieraical data, i am stuck where the json sub column is aded but its comming as string not as json.

**Code:**

from pyspark.sql.functions import *

#sample data
df=spark.createDataFrame([('1234567','123 Main St','10SjtT','idk@gmail.com','ecom','direct')],['cust_id','address','store_id','email','sales_channel','category'])

i want to represent this dataframe into this format

{
    "store_id": "10SjtT",
    "category": "direct",
    "sales_channel": "ecom",
    "email": "idk@gmail.com",
    "c_email": {
        "category": "direct",
        "email": "idk@gmail.com"
    }
}

i trid to add column but my sample code adds the nested json as a string with quotations

{
"store_id":"10SjtT","category":"direct","sales_channel":"ecom" 
,"c_email":"{\"category\":\"direct\",\"email\":\"idk@gmail.com\"}"

}

result like this

**Code used to build this **

dff = df.select("cust_id","address",to_json(struct("store_id","category","sales_channel","email",to_json(struct( "category" ,"email")).alias("c_email"))).alias("metadata")) 

dff.select("metadata").show(10,False) 

Please let me know if anyone faced the similar issue and able to build nested json and carrying the json format across.

Thanks in advance.

Manoj.


Solution

  • Remove the internal nested to_json() call.

    from pyspark.sql.functions import *
    
    #sample data
    df=sqlContext.createDataFrame([('1234567','123 Main St','10SjtT','idk@gmail.com','ecom','direct')],['cust_id','address','store_id','email','sales_channel','category'])
    
    dff = df.select("cust_id","address",to_json(struct("store_id","category","sales_channel","email",struct( "category" ,"email").alias("c_email"))).alias("metadata"))
    
    dff.select("metadata").show(10,False)
    

    Output:

    +------------------------------------------------------------------------------------------------------------------------------------------------+
    |metadata                                                                                                                                        |
    +------------------------------------------------------------------------------------------------------------------------------------------------+
    |{"store_id":"10SjtT","category":"direct","sales_channel":"ecom","email":"idk@gmail.com","c_email":{"category":"direct","email":"idk@gmail.com"}}|
    +------------------------------------------------------------------------------------------------------------------------------------------------+