pythonpandasdataframe

Add columns to dataframe from a dictionary


There are many answers out there to this question, but I couldn't find one that applies to my case.

I have a dataframe that contains ID's:

df = pd.DataFrame({"id": [0, 1, 2, 3, 4]})

Now, I query a REST API for each ID's to get additional attributes that are returned to me as a dictionary:

{"id": 0, "values": {"first_name": "Bob", "last_name": "Smith"}}

What I want is to add the content of values as additional columns to the matching row of the dataframe.

An important point is that, at each iteration, I may get different attributes, so I don't know how many columns will be added in the end, or even their names. So, sometimes I need to add a column (which I would do with pd.concat), but sometimes I need to set the value to an existing one.

id first_name last_name something something_else
0 Bob Smith
4

Any thought?


Solution

  • I agree there are many ways out there to do this. You can do it this way with List Comprehension approach which will be Faster I guess, and will handle dynamic columns well.

    import pandas as pd
    
    df = pd.DataFrame({"id": [0, 1, 2, 3, 4]})
    
    # Simulated API response
    def get_api_data(id):
        data = {
            0: {"first_name": "Bob", "last_name": "Smith"},
            1: {"first_name": "Alice", "something": "extra"},
            2: {"last_name": "Jones", "something_else": "value"},
            3: {"first_name": "Charlie", "age": 30},
            4: {}  # No data available for this ID
        }
        return data.get(id, {})
    
    def update_dataframe(df):
        all_columns = set()
        data_list = []
        
        for _, row in df.iterrows():
            api_data = get_api_data(row['id'])
            all_columns.update(api_data.keys())
            row_data = row.to_dict()
            row_data.update(api_data)
            data_list.append(row_data)
        
        result_df = pd.DataFrame(data_list, columns=list(all_columns) + list(df.columns))
        
        return result_df
    
    updated_df = update_dataframe(df)
    print(updated_df)
    

    Output:

       id first_name last_name something something_else   age
    0   0        Bob     Smith       NaN            NaN   NaN
    1   1      Alice       NaN     extra            NaN   NaN
    2   2        NaN     Jones       NaN          value   NaN
    3   3    Charlie       NaN       NaN            NaN  30.0
    4   4        NaN       NaN       NaN            NaN   NaN