pythonjsonpandasndjson

De-normalize json object into flat objects


I have a json object like

 {
        "id": 3590403096656,
        "title": "Romania Special Zip Hoodie Blue - Version 02 A5",
        "tags": [
            "1ST THE WORLD FOR YOU <3",
            "apparel",
        ],
        "props": [
            {
                "id": 28310659235920,
                "title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            },
            {
                "id": 444444444444,
                "title": "number 2",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            }
        ]
}

i want to flatten it so desired output looks like

{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 28310659235920,"props.title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00",       "props.updated_at": "2019-05-22T01:03:29+07:00"}
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 444444444444,"props.title": "number 2","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00","props.updated_at": "2019-05-22T01:03:29+07:00"}

so far i have tried:

from pandas.io.json import json_normalize
json_normalize(sample_object)

where sample_object contains json object, i am looping through a large file of such objects which i want to flatten in desired format.

json_normalize is not giving me desired output, i want to keep tags as it is but flatten props and repeat parent object info.


Solution

  • please try this:

    import copy
    
    obj =  {
            "id": 3590403096656,
            "title": "Romania Special Zip Hoodie Blue - Version 02 A5",
            "tags": [
                "1ST THE WORLD FOR YOU <3",
                "apparel",
            ],
            "props": [
                {
                    "id": 28310659235920,
                    "title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
                    "position": 1,
                    "product_id": 3590403096656,
                    "created_at": "2019-05-22T00:46:19+07:00",
                    "updated_at": "2019-05-22T01:03:29+07:00"
                },
                {
                    "id": 444444444444,
                    "title": "number 2",
                    "position": 1,
                    "product_id": 3590403096656,
                    "created_at": "2019-05-22T00:46:19+07:00",
                    "updated_at": "2019-05-22T01:03:29+07:00"
                }
            ]
    }
    
    props = obj.pop("props")
    
    for p in props:
        res = copy.deepcopy(obj)
        for k in p:
            res["props."+k] = p[k]
        print(res)
    

    basically it use pop("props") to get the obj without "props" (which is the common part to use in all result objects),

    then we iterate through props, and create new objects that contain the base object, and then fill "props.key" for every key in every prop.