pythonpandasdataframenumpynumpy-ndarray

PandasData Frame Handeling Of Nested List Within Json In Pandas Table For Subsequent Numpy Array


So I've been working with a json.loads() of blockdevices for a small project that i'm working on. Presently, I'm trying to feed json data into a Pandas dataframe, then convert that dataframe into a numpy array to be processed by a third party project called CurseXcel, but I digress.

What i'm presently having trouble with is correctly handling children within a list of dictionaries of json data. What code i've been able to work out is essentially

def return_pandas():
    process = subprocess.run("lsblk --json -o NAME,SIZE,UUID,MOUNTPOINT,PATH,FSTYPE ".split(), capture_output=True, text=True)
    data = json.loads(process.stdout)
    return pd.json_normalize(data['blockdevices'])

then

json_pandas = return_pandas()

which returns

  name   size  uuid mountpoint      path fstype                                           children
0  sda    25G  None       None  /dev/sda   None                                                NaN
1  sdb    25G  None       None  /dev/sdb   None  [{'name': 'sdb1', 'size': '512M', 'uuid': '008...
2  sr0  1024M  None       None  /dev/sr0   None                                                NaN

as a table. Again, what i'm having trouble with is getting the 'children' list within said json and pandas dataframe table to be potentially listed alongside regular devices or listed in a more normalized fashion within the children column comprising name size uuid etc. This would involve potentially using recursion to flatten all data or something else.

Numpy data looks like this

[['sda' '25G' None None '/dev/sda' None nan]
 ['sdb' '25G' None None '/dev/sdb' None
  list([{'name': 'sdb1', 'size': '512M', 'uuid': '0087-DF28', 'mountpoint': '/boot/efi', 'path': '/dev/sdb1', 'fstype': 'vfat'}, {'name': 'sdb2', 'size': '488M', 'uuid': 'bd51147c-8f8c-4d3a-afde-4ebf67ae4558', 'mountpoint': '/boot', 'path': '/dev/sdb2', 'fstype': 'ext2'}, {'name': 'sdb3', 'size': '24G', 'uuid': 'd07d17bf-ebbd-491a-b57c-f9d43b7e6be5', 'mountpoint': None, 'path': '/dev/sdb3', 'fstype': 'crypto_LUKS', 'children': [{'name': 'sda3_crypt', 'size': '24G', 'uuid': '3NbbDi-BtQ4-PXzK-NVBO-cR2d-aLzZ-pzh0A5', 'mountpoint': None, 'path': '/dev/mapper/sda3_crypt', 'fstype': 'LVM2_member', 'children': [{'name': 'kali--vg-root', 'size': '23G', 'uuid': '6bc26692-9b22-4d38-b1c0-53e99326e9d5', 'mountpoint': '/', 'path': '/dev/mapper/kali--vg-root', 'fstype': 'ext4'}, {'name': 'kali--vg-swap_1', 'size': '980M', 'uuid': '3eb8e4cc-56bb-4b5b-b2a8-bc7ac016df67', 'mountpoint': '[SWAP]', 'path': '/dev/mapper/kali--vg-swap_1', 'fstype': 'swap'}]}]}])]
 ['sr0' '1024M' None None '/dev/sr0' None nan]]

If i could get that 'list' after rendering the table to a numpy array as it's own unnested somehow that'd be great. Again i'm at a loss, yet i'm sure it's something simple i'm missing.

this could involve flattening into a 1 dimensional array however i still want to maintain the ability to use numpyarrays. i've thought about using recursion however dont know specifically how to grab a nested list without recreating the entire structure, nor a conditional statement that applies to nested list.

data output per request

{'blockdevices': [{'name': 'sda', 'size': '25G', 'uuid': None, 'mountpoint': None, 'path': '/dev/sda', 'fstype': None}, {'name': 'sdb', 'size': '25G', 'uuid': None, 'mountpoint': None, 'path': '/dev/sdb', 'fstype': None, 'children': [{'name': 'sdb1', 'size': '512M', 'uuid': '0087-DF28', 'mountpoint': '/boot/efi', 'path': '/dev/sdb1', 'fstype': 'vfat'}, {'name': 'sdb2', 'size': '488M', 'uuid': 'bd51147c-8f8c-4d3a-afde-4ebf67ae4558', 'mountpoint': '/boot', 'path': '/dev/sdb2', 'fstype': 'ext2'}, {'name': 'sdb3', 'size': '24G', 'uuid': 'd07d17bf-ebbd-491a-b57c-f9d43b7e6be5', 'mountpoint': None, 'path': '/dev/sdb3', 'fstype': 'crypto_LUKS', 'children': [{'name': 'sda3_crypt', 'size': '24G', 'uuid': '3NbbDi-BtQ4-PXzK-NVBO-cR2d-aLzZ-pzh0A5', 'mountpoint': None, 'path': '/dev/mapper/sda3_crypt', 'fstype': 'LVM2_member', 'children': [{'name': 'kali--vg-root', 'size': '23G', 'uuid': '6bc26692-9b22-4d38-b1c0-53e99326e9d5', 'mountpoint': '/', 'path': '/dev/mapper/kali--vg-root', 'fstype': 'ext4'}, {'name': 'kali--vg-swap_1', 'size': '980M', 'uuid': '3eb8e4cc-56bb-4b5b-b2a8-bc7ac016df67', 'mountpoint': '[SWAP]', 'path': '/dev/mapper/kali--vg-swap_1', 'fstype': 'swap'}]}]}]}, {'name': 'sr0', 'size': '1024M', 'uuid': None, 'mountpoint': None, 'path': '/dev/sr0', 'fstype': None}]}
                                                                                                                                                                     

Solution

  • Is this your expected output?

    code:

    df = pd.json_normalize(data=data.get("blockdevices")).explode(column="children")
    df = (pd
          .concat(objs=[df, df.children.apply(func=pd.Series)], axis=1)
          .drop(columns=[0, "children"])
          .fillna("")
          .reset_index(drop=True)
          )
    print(df)
    

    result:

      name   size uuid mountpoint      path fstype  name  size                                  uuid mountpoint       path       fstype
    0  sda    25G                  /dev/sda                                                                                            
    1  sdb    25G                  /dev/sdb         sdb1  512M                             0087-DF28  /boot/efi  /dev/sdb1         vfat
    2  sdb    25G                  /dev/sdb         sdb2  488M  bd51147c-8f8c-4d3a-afde-4ebf67ae4558      /boot  /dev/sdb2         ext2
    3  sdb    25G                  /dev/sdb         sdb3   24G  d07d17bf-ebbd-491a-b57c-f9d43b7e6be5             /dev/sdb3  crypto_LUKS
    4  sr0  1024M                  /dev/sr0