pythonjsonpandasdataframejson-flattener

Flatting a JSON file into Pandas Dataframe in Python


I have the json in this format:

{
    "fields": {
        "tcidte": {
            "mode": "required",
            "type": "date",
            "format": "%Y%m%d"
        },
        "tcmcid": {
            "mode": "required",
            "type": "string"
        },
        "tcacbr": {
            "mode": "required",
            "type": "string"
        }
    }
}

I want it to be in a dataframe format where each of the three field names are separate rows. Where one row has a column(e.g "format") where others are blank should be assumed to be NULL.

I have tried to use the flatten_json function which I found on here, but doesn't work as expected but will still include here:

def flatten_json(nested_json, exclude=['']):
    """Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
            exclude: Keys to exclude from output.
        Returns:
            The flattened json object if successful, None otherwise.
    """
    out = {}

    def flatten(x, name='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude: flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

flatten_json_file = pd.DataFrame(flatten_json(nested_json))
pprint.pprint(flatten_json_file)

Additional Complexity JSON:

{
    "fields": {
        "action": {
            "type": {
                "field_type": "string"
            },
            "mode": "required"
        },
        "upi": {
            "type": {
                "field_type": "string"
            },
            "regex": "^[0-9]{9}$",
            "mode": "required"
        },
        "firstname": {
            "type": {
                "field_type": "string"
            },
            "mode": "required"
        }
    }
}

Solution

  • With

    data = {
        "fields": {
            "tcidte": {
                "mode": "required",
                "type": "date",
                "format": "%Y%m%d"
            },
            "tcmcid": {
                "mode": "required",
                "type": "string"
            },
            "tcacbr": {
                "mode": "required",
                "type": "string"
            }
        }
    }
    

    this

    df = pd.DataFrame(data["fields"].values())
    

    results in

           mode    type  format
    0  required    date  %Y%m%d
    1  required  string     NaN
    2  required  string     NaN
    

    Is that your goal?

    If you want the keys of data["fields"] as index:

    df = pd.DataFrame(data["fields"]).T
    

    or

    df = pd.DataFrame.from_dict(data["fields"], orient="index")
    

    both result in

                mode    type  format
    tcidte  required    date  %Y%m%d
    tcmcid  required  string     NaN
    tcacbr  required  string     NaN
    

    With

    data = {
        "fields": {
            "action": {
                "type": {
                    "field_type": "string"
                },
                "mode": "required"
            },
            "upi": {
                "type": {
                    "field_type": "string"
                },
                "regex": "^[0-9]{9}$",
                "mode": "required"
            },
            "firstname": {
                "type": {
                    "field_type": "string"
                },
                "mode": "required"
            }
        }
    }
    

    you could either do

    data = {key: {**d, **d["type"]} for key, d in data["fields"].items()}
    df = pd.DataFrame.from_dict(data, orient="index").drop(columns="type")
    

    or

    df = pd.DataFrame.from_dict(data["fields"], orient="index")
    df = pd.concat(
        [df, pd.DataFrame(df.type.to_list(), index=df.index)], axis=1
    ).drop(columns="type")
    

    with a result like (column positions may differ)

                   mode field_type       regex
    action     required     string         NaN
    upi        required     string  ^[0-9]{9}$
    firstname  required     string         NaN