pythonjsonnested

How to extract nested json using json_normalize?


I have a nested json, but I can't understand how to work with them.

{
    "return": {
        "status_processing": "3",
        "status": "OK",
        "order": {
            "id": "872102042",
            "number": "123831",
            "date_order": "dd/mm/yyyy",
            "items": [
                {
                    "item": {
                        "id_product": "684451795",
                        "code": "VPOR",
                        "description": "Product 1",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                },
                {
                    "item": {
                        "id_product": "684451091",
                        "code": "VSAP",
                        "description": "Product 2",
                        "unit": "Un",
                        "quantity": "1.00",
                        "value": "31.76"
                    }
                }
            ]
        }
    }
}

I searched on stackoverflow questions, and try some resolutions that people passed, but don't work for me.

Here an sample that I used to accessing the data from json:

df = pd.json_normalize(
    order_list,
    record_path=["return", "order", "itens"],
    meta=[
        ["return", "order", "id"],
        ["return", "order", "date_order"],
        ["return", "order", "number"],
    ],
)

But don't work, they duplicating the data when I send to dataframe.

Anyone can help me?

EDIT

Here an example that I used:

Convert nested JSON to pandas DataFrame

And what I expected:

enter image description here


Solution

  • I don't know what exactly you expect in output but if you want every item in new row then you could use normal code with for-loop for this.

    order_list = {
        "return": {
            "status_processing": "3",
            "status": "OK",
            "order": {
                "id": "872102042",
                "number": "123831",
                "date_order": "dd/mm/yyyy",
                 "itens": [
                    {
                        "item": {
                            "id_product": "684451795",
                            "code": "VPOR",
                            "description": "Product 1",
                            "unit": "Un",
                            "quantity": "1.00",
                            "value": "31.76"
                        }
                    },
                    {
                        "item": {
                            "id_product": "684451091",
                            "code": "VSAP",
                            "description": "Product 2",
                            "unit": "Un",
                            "quantity": "1.00",
                            "value": "31.76"
                        }
                    }
                ]
            }
        }
    }
    
    import pandas as pd
    
    data = []
    
    order = order_list['return']['order']
    
    for iten in order['itens']:
        for key, val in iten.items():
            row = {
                #'key': key, 
                'id': order['id'], 
                'date_order': order['date_order'], 
                'number': order['number'], 
                'id_product': val['id_product'],
                #'code': val['code'],
                #'description': val['description'],
                #'quantity': val['quantity'],
                #'value': val['value'],
            }
            data.append(row)
    
    df = pd.DataFrame(data)
    print(df)
    

    Result:

              id  date_order  number id_product
    0  872102042  dd/mm/yyyy  123831  684451795
    1  872102042  dd/mm/yyyy  123831  684451091
    

    If you need other information in rows then you should show it in question.