pythonjsonpandas

How to flatten multilevel/nested JSON?


I am trying to convert JSON to CSV file, that I can use for further analysis. Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file.

I tried to use pandas json_normalize(), but it only flattens first level.

import json
import pandas as pd
from pandas.io.json import json_normalize
from cs import CloudStack

api_key = xxxx
secret = xxxx
endpoint = xxxx

cs = CloudStack(endpoint=endpoint,
                key=api_key,
                secret=secret)

virtual_machines = cs.virtMach()

test = json_normalize(virtual_machines["virtualmachine"])

test.to_csv("test.csv", sep="|", index=False)

Any idea how to flatter whole JSON file, so I can create single line input to CSV file for single (in this case virtual machine) entry? I have tried couple of solutions posted here, but my result was always only first level was flattened.

This is sample JSON (in this case, I still get "securitygroup" and "nic" output as JSON format:

{
    "count": 13,
    "virtualmachine": [
        {
            "id": "1082e2ed-ff66-40b1-a41b-26061afd4a0b",
            "name": "test-2",
            "displayname": "test-2",
            "securitygroup": [
                {
                    "id": "9e649fbc-3e64-4395-9629-5e1215b34e58",
                    "name": "test",
                    "tags": []
                }
            ],
            "nic": [
                {
                    "id": "79568b14-b377-4d4f-b024-87dc22492b8e",
                    "networkid": "05c0e278-7ab4-4a6d-aa9c-3158620b6471"
                },
                {
                    "id": "3d7f2818-1f19-46e7-aa98-956526c5b1ad",
                    "networkid": "b4648cfd-0795-43fc-9e50-6ee9ddefc5bd",
                    "traffictype": "Guest"
                }
            ],
            "hypervisor": "KVM",
            "affinitygroup": [],
            "isdynamicallyscalable": false
        }
    ]
}

Solution

  • I used the following function (details can be found here):

    def flatten_data(y):
        out = {}
    
        def flatten(x, name=''):
            if type(x) is dict:
                for a in x:
                    flatten(x[a], name + a + '_')
            elif type(x) is list:
                i = 0
                for a in x:
                    flatten(a, name + str(i) + '_')
                    i += 1
            else:
                out[name[:-1]] = x
    
        flatten(y)
        return out
    

    This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.

    What I used, in the end, was json_normalize() and specified structure that I required. A nice example of how to do it that way can be found here.