There are many answers out there to this question, but I couldn't find one that applies to my case.
I have a dataframe that contains ID's:
df = pd.DataFrame({"id": [0, 1, 2, 3, 4]})
Now, I query a REST API for each ID's to get additional attributes that are returned to me as a dictionary:
{"id": 0, "values": {"first_name": "Bob", "last_name": "Smith"}}
What I want is to add the content of values
as additional columns to the matching row of the dataframe.
An important point is that, at each iteration, I may get different attributes, so I don't know how many columns will be added in the end, or even their names. So, sometimes I need to add a column (which I would do with pd.concat
), but sometimes I need to set the value to an existing one.
id | first_name | last_name | something | something_else |
---|---|---|---|---|
0 | Bob | Smith | ||
… | ||||
4 |
Any thought?
I agree there are many ways out there to do this. You can do it this way with List Comprehension approach which will be Faster I guess, and will handle dynamic columns well.
import pandas as pd
df = pd.DataFrame({"id": [0, 1, 2, 3, 4]})
# Simulated API response
def get_api_data(id):
data = {
0: {"first_name": "Bob", "last_name": "Smith"},
1: {"first_name": "Alice", "something": "extra"},
2: {"last_name": "Jones", "something_else": "value"},
3: {"first_name": "Charlie", "age": 30},
4: {} # No data available for this ID
}
return data.get(id, {})
def update_dataframe(df):
all_columns = set()
data_list = []
for _, row in df.iterrows():
api_data = get_api_data(row['id'])
all_columns.update(api_data.keys())
row_data = row.to_dict()
row_data.update(api_data)
data_list.append(row_data)
result_df = pd.DataFrame(data_list, columns=list(all_columns) + list(df.columns))
return result_df
updated_df = update_dataframe(df)
print(updated_df)
Output:
id first_name last_name something something_else age
0 0 Bob Smith NaN NaN NaN
1 1 Alice NaN extra NaN NaN
2 2 NaN Jones NaN value NaN
3 3 Charlie NaN NaN NaN 30.0
4 4 NaN NaN NaN NaN NaN