pythonpandasdataframesplitinfobox

Expand a dataframe column to many


I have read some posts but I have not been able to get what I want. I have a dataframe with ~4k rows and a few columns which I exported from Infoblox (DNS server). One of them is dhcp attributes and I would like to expand it to have separated values. This is my df (I attach a screenshot from excel): excel screenshot

One of the columns is a dictionary from all the options, this is an example(sanitized):

[
    {"name": "tftp-server-name", "num": 66, "value": "10.70.0.27", "vendor_class": "DHCP"},
    {"name": "bootfile-name", "num": 67, "value": "pxelinux.0", "vendor_class": "DHCP"},
    {"name": "dhcp-lease-time", "num": 51, "use_option": False, "value": "21600", "vendor_class": "DHCP"},
    {"name": "domain-name-servers", "num": 6, "use_option": False, "value": "10.71.73.143,10.71.74.163", "vendor_class": "DHCP"},
    {"name": "domain-name", "num": 15, "use_option": False, "value": "example.com", "vendor_class": "DHCP"},
    {"name": "routers", "num": 3, "use_option": True, "value": "10.70.1.200", "vendor_class": "DHCP"},
]

I would like to expand this column to some (to the same row), like this. Using "name" as df column and "value" as row value. This would be the goal:

      tftp-server-name         voip-tftp-server                  dhcp-lease-time        domain-name-server        domain-name        routers
0      10.71.69.58              10.71.69.58,10.71.69.59           86400           10.71.73.143,10.71.74.163       example.com      10.70.12.254

In order to have a global df with all the information, I guess I should create a new df keeping the index to merge with primary, but I wasn't able to do it. I have tried with expand, append, explode... Please, could you help me?

Thank you so much for your solution (to both). I could get it work, this is my final file: I could do it. I add complete solution, just in case someone need it (maybe there is a way more pythonic, but it works):

def formato(df):

    opciones = df['options']
    df_int = pd.DataFrame()
    for i in opciones:
        df_int = df_int.append(pd.DataFrame(i).set_index("name")[["value"]].T.reset_index(drop=True))
    df_int.index = range(len(df_int.index))
    df_global = pd.merge(df, df_int, left_index=True, right_index=True, how="inner")
    df_global = df.rename(columns={"comment": "Comentario", "end_addr": "IP Fin", "network": "Red",
                    "start_addr": "IP Inicio", "disable": "Deshabilitado"})
    df_global = df_global[["Red", "Comentario", "IP Inicio", "IP Fin", "dhcp-lease-time",
            "domain-name-servers", "domain-name", "routers", "tftp-server-name", "bootfile-name",
            "voip-tftp-server", "wdm-server-ip-address", "ftp-file-server", "vendor-encapsulated-options"]]
    return df_global

Solution

  • Here is one solution:

    import pandas as pd
    data = [{'name': 'tftp-server-name', 'num': 66, 'value': '10.70.0.27', 'vendor_class': 'DHCP'}, {'name': 'bootfile-name', 'num': 67, 'value': 'pxelinux.0', 'vendor_class': 'DHCP'}, {'name': 'dhcp-lease-time', 'num': 51, 'use_option': False, 'value': '21600', 'vendor_class': 'DHCP'}, {'name': 'domain-name-servers', 'num': 6, 'use_option': False, 'value': '10.71.73.143,10.71.74.163', 'vendor_class': 'DHCP'}, {'name': 'domain-name', 'num': 15, 'use_option': False, 'value': 'example.com', 'vendor_class': 'DHCP'}, {'name': 'routers', 'num': 3, 'use_option': True, 'value': '10.70.1.200', 'vendor_class': 'DHCP'}]
    
    df = pd.DataFrame(data).set_index("name")[["value"]].T.reset_index(drop=True)    
    

    output:

    name tftp-server-name bootfile-name dhcp-lease-time        domain-name-servers  domain-name      routers
    0          10.70.0.27    pxelinux.0           21600  10.71.73.143,10.71.74.163  example.com  10.70.1.200