So I've been working with a json.loads() of blockdevices for a small project that i'm working on. Presently, I'm trying to feed json data into a Pandas dataframe, then convert that dataframe into a numpy array to be processed by a third party project called CurseXcel, but I digress.
What i'm presently having trouble with is correctly handling children within a list of dictionaries of json data. What code i've been able to work out is essentially
def return_pandas():
process = subprocess.run("lsblk --json -o NAME,SIZE,UUID,MOUNTPOINT,PATH,FSTYPE ".split(), capture_output=True, text=True)
data = json.loads(process.stdout)
return pd.json_normalize(data['blockdevices'])
then
json_pandas = return_pandas()
which returns
name size uuid mountpoint path fstype children
0 sda 25G None None /dev/sda None NaN
1 sdb 25G None None /dev/sdb None [{'name': 'sdb1', 'size': '512M', 'uuid': '008...
2 sr0 1024M None None /dev/sr0 None NaN
as a table. Again, what i'm having trouble with is getting the 'children' list within said json and pandas dataframe table to be potentially listed alongside regular devices or listed in a more normalized fashion within the children column comprising name size uuid etc. This would involve potentially using recursion to flatten all data or something else.
Numpy data looks like this
[['sda' '25G' None None '/dev/sda' None nan]
['sdb' '25G' None None '/dev/sdb' None
list([{'name': 'sdb1', 'size': '512M', 'uuid': '0087-DF28', 'mountpoint': '/boot/efi', 'path': '/dev/sdb1', 'fstype': 'vfat'}, {'name': 'sdb2', 'size': '488M', 'uuid': 'bd51147c-8f8c-4d3a-afde-4ebf67ae4558', 'mountpoint': '/boot', 'path': '/dev/sdb2', 'fstype': 'ext2'}, {'name': 'sdb3', 'size': '24G', 'uuid': 'd07d17bf-ebbd-491a-b57c-f9d43b7e6be5', 'mountpoint': None, 'path': '/dev/sdb3', 'fstype': 'crypto_LUKS', 'children': [{'name': 'sda3_crypt', 'size': '24G', 'uuid': '3NbbDi-BtQ4-PXzK-NVBO-cR2d-aLzZ-pzh0A5', 'mountpoint': None, 'path': '/dev/mapper/sda3_crypt', 'fstype': 'LVM2_member', 'children': [{'name': 'kali--vg-root', 'size': '23G', 'uuid': '6bc26692-9b22-4d38-b1c0-53e99326e9d5', 'mountpoint': '/', 'path': '/dev/mapper/kali--vg-root', 'fstype': 'ext4'}, {'name': 'kali--vg-swap_1', 'size': '980M', 'uuid': '3eb8e4cc-56bb-4b5b-b2a8-bc7ac016df67', 'mountpoint': '[SWAP]', 'path': '/dev/mapper/kali--vg-swap_1', 'fstype': 'swap'}]}]}])]
['sr0' '1024M' None None '/dev/sr0' None nan]]
If i could get that 'list' after rendering the table to a numpy array as it's own unnested somehow that'd be great. Again i'm at a loss, yet i'm sure it's something simple i'm missing.
this could involve flattening into a 1 dimensional array however i still want to maintain the ability to use numpyarrays. i've thought about using recursion however dont know specifically how to grab a nested list without recreating the entire structure, nor a conditional statement that applies to nested list.
data output per request
{'blockdevices': [{'name': 'sda', 'size': '25G', 'uuid': None, 'mountpoint': None, 'path': '/dev/sda', 'fstype': None}, {'name': 'sdb', 'size': '25G', 'uuid': None, 'mountpoint': None, 'path': '/dev/sdb', 'fstype': None, 'children': [{'name': 'sdb1', 'size': '512M', 'uuid': '0087-DF28', 'mountpoint': '/boot/efi', 'path': '/dev/sdb1', 'fstype': 'vfat'}, {'name': 'sdb2', 'size': '488M', 'uuid': 'bd51147c-8f8c-4d3a-afde-4ebf67ae4558', 'mountpoint': '/boot', 'path': '/dev/sdb2', 'fstype': 'ext2'}, {'name': 'sdb3', 'size': '24G', 'uuid': 'd07d17bf-ebbd-491a-b57c-f9d43b7e6be5', 'mountpoint': None, 'path': '/dev/sdb3', 'fstype': 'crypto_LUKS', 'children': [{'name': 'sda3_crypt', 'size': '24G', 'uuid': '3NbbDi-BtQ4-PXzK-NVBO-cR2d-aLzZ-pzh0A5', 'mountpoint': None, 'path': '/dev/mapper/sda3_crypt', 'fstype': 'LVM2_member', 'children': [{'name': 'kali--vg-root', 'size': '23G', 'uuid': '6bc26692-9b22-4d38-b1c0-53e99326e9d5', 'mountpoint': '/', 'path': '/dev/mapper/kali--vg-root', 'fstype': 'ext4'}, {'name': 'kali--vg-swap_1', 'size': '980M', 'uuid': '3eb8e4cc-56bb-4b5b-b2a8-bc7ac016df67', 'mountpoint': '[SWAP]', 'path': '/dev/mapper/kali--vg-swap_1', 'fstype': 'swap'}]}]}]}, {'name': 'sr0', 'size': '1024M', 'uuid': None, 'mountpoint': None, 'path': '/dev/sr0', 'fstype': None}]}
Is this your expected output?
code:
df = pd.json_normalize(data=data.get("blockdevices")).explode(column="children")
df = (pd
.concat(objs=[df, df.children.apply(func=pd.Series)], axis=1)
.drop(columns=[0, "children"])
.fillna("")
.reset_index(drop=True)
)
print(df)
result:
name size uuid mountpoint path fstype name size uuid mountpoint path fstype
0 sda 25G /dev/sda
1 sdb 25G /dev/sdb sdb1 512M 0087-DF28 /boot/efi /dev/sdb1 vfat
2 sdb 25G /dev/sdb sdb2 488M bd51147c-8f8c-4d3a-afde-4ebf67ae4558 /boot /dev/sdb2 ext2
3 sdb 25G /dev/sdb sdb3 24G d07d17bf-ebbd-491a-b57c-f9d43b7e6be5 /dev/sdb3 crypto_LUKS
4 sr0 1024M /dev/sr0