I have a dataframe that looks like this
╔══════╦══════╦══════════════════════════════════╦═══════════════════════╗
║ ID ║ Name ║ Pets ║ CareCenter ║
╠══════╬══════╬══════════════════════════════════╬═══════════════════════╣
║ 1 ║ John ║ [{'Name':'Alvin','Breed':'Dog'}, ║ {'Name':'PET SHOP 1'} ║
║ ║ ║ {'Name':'Rex','Type':'Cat'}] ║ ║
╠══════╬══════╬══════════════════════════════════╬═══════════════════════╣
║ 10 ║ Mary ║ {'Name':'Rose','Type':'Cat'} ║ {'Name':'PET SHOP 2'} ║
╠══════╬══════╬══════════════════════════════════╬═══════════════════════╣
║ 1234 ║ Joe ║ {'Name':'Max','Type':'Bird'} ║ {'Name':'PET SHOP 3'} ║
╚══════╩══════╩══════════════════════════════════╩═══════════════════════╝
I iterate over each column to check its type.
if it's not an object (e.g. string, int, etc) i leave it untouched.
If it's an object, then:
3.1) I get the first row and get the types for each column
3.2) if the column is an object and a dictionary (just one item) I get this df
╔══════╦═══════════════════════╗
║ ID ║ CareCenter ║
╠══════╬═══════════════════════╣
║ 1 ║ {'Name':'PET SHOP 1'} ║
╠══════╬═══════════════════════╣
║ 10 ║ {'Name':'PET SHOP 2'} ║
╠══════╬═══════════════════════╣
║ 1234 ║ {'Name':'PET SHOP 3'} ║
╚══════╩═══════════════════════╝
I then apply a json_normalize on the single item-dict column.
╔══════╦═══════════════════════╦═══════════════════════╗
║ ID ║ CareCenter ║ CareCenter_Name ║
╠══════╬═══════════════════════╬═══════════════════════╣
║ 1 ║ {'Name':'PET SHOP 1'} ║ PET SHOP 1 ║
╠══════╬═══════════════════════╬═══════════════════════╣
║ 10 ║ {'Name':'PET SHOP 2'} ║ PET SHOP 2 ║
╠══════╬═══════════════════════╬═══════════════════════╣
║ 1234 ║ {'Name':'PET SHOP 3'} ║ PET SHOP 3 ║
╚══════╩═══════════════════════╩═══════════════════════╝
And then join it back to the original df on the ID column. Using the key name concatenated with the Original Column Name and dropping the original column.
╔══════╦══════╦══════════════════════════════════╦═════════════════╗
║ ID ║ Name ║ Pets ║ CareCenter_Name ║
╠══════╬══════╬══════════════════════════════════╬═════════════════╣
║ 1 ║ John ║ [{'Name':'Alvin','Breed':'Dog'}, ║ PET SHOP 1 ║
║ ║ ║ {'Name':'Rex','Type':'Cat'}] ║ ║
╠══════╬══════╬══════════════════════════════════╬═════════════════╣
║ 10 ║ Mary ║ {'Name':'Rose','Type':'Cat'} ║ PET SHOP 2 ║
╠══════╬══════╬══════════════════════════════════╬═════════════════╣
║ 1234 ║ Joe ║ {'Name':'Max','Type':'Bird'} ║ PET SHOP 3 ║
╚══════╩══════╩══════════════════════════════════╩═════════════════╝
3.3) If the column is a list of dictionaries, I create a new df like this, just taking the ID and the incumbent column:
╔══════╦══════════════════════════════════╗
║ ID ║ Pets ║
╠══════╬══════════════════════════════════╣
║ 1 ║ [{'Name':'Alvin','Breed':'Dog'}, ║
║ ║ {'Name':'Rex','Type':'Cat'}] ║
╠══════╬══════════════════════════════════╣
║ 10 ║ {'Name':'Rose','Type':'Cat'} ║
╠══════╬══════════════════════════════════╣
║ 1234 ║ {'Name':'Max','Type':'Bird'} ║
╚══════╩══════════════════════════════════╝
However I'm stuck as to how to concatenate the values of each nested keys into new columns, I'm looking for something like this, after that I can take it from there and join it back to the original df. Order of concatenated strings is not important, as long as they're in their correct column
╔══════╦═══════════╦═══════════╗ ║ ID ║ Pets_Name ║ Pets_Type ║ ╠══════╬═══════════╬═══════════╣ ║ 1 ║ Alvin,Rex ║ Dog,Cat ║ ╠══════╬═══════════╬═══════════╣ ║ 10 ║ Rose ║ Cat ║ ╠══════╬═══════════╬═══════════╣ ║ 1234 ║ Name ║ Bird ║ ╚══════╩═══════════╩═══════════╝
Here is my code so far:
for column in data_df:
if data_df.dtypes[column] == "object":
new = data_df[['Id', column]].copy()
new = new.dropna(subset = [column])
a = (new.sample(1).applymap(type) == list).all()
islist = False
for i,v in a.items():
if(v==True and i==column):
islist = True
if(islist==True):
for c in(json_normalize(new[column].sample(1).iloc[0])):
******STUCK***** new = new.join(','.join({v for x in new[column] for y in x for k,v in y.items() if k==c})) ****** STUCK ****
else:
new = new.join(json_normalize(new[column]))
new = new.drop(column,axis=1)
new = new.add_prefix(column + '_')
#data_df=data_df.join(new, on='Id',how='left')
data_df=pd.merge(data_df, new, how='left', left_on='Id', right_on=column + '_Id' )
data_df = data_df.drop(column, 1)
data_df = data_df.drop(column + '_Id', 1)
EDIT: This would be the final table I'm looking for
╔══════╦══════╦═══════════╦═══════════╦═════════════════╗ ║ ID ║ Name ║ Pets_Name ║ Pets_Type ║ CareCenter_Name ║ ╠══════╬══════╬═══════════╬═══════════╬═════════════════╣ ║ 1 ║ John ║ Alvin,Rex ║ Dog,Cat ║ PET SHOP 1 ║ ╠══════╬══════╬═══════════╬═══════════╬═════════════════╣ ║ 10 ║ Mary ║ Rose ║ Cat ║ PET SHOP 2 ║ ╠══════╬══════╬═══════════╬═══════════╬═════════════════╣ ║ 1234 ║ Joe ║ Max ║ Bird ║ PET SHOP 3 ║ ╚══════╩══════╩═══════════╩═══════════╩═════════════════╝
Thanks in advance for any pointer you could give me,
FINAL EDIT:
This is my working code with the help from @BEN_YO
for column in data_df:
#mytype = data_df.dtypes[column]
mn = data_df.sample(1).applymap(type)
mytype = mn[column].values[0]
if mytype is dict or mytype is list:
new = data_df[['Id', column]].copy()
new = new.dropna(subset = [column])
a = (new.sample(1).applymap(type) == list).all()
islist = False
for i,v in a.items():
if(v==True and i==column):
islist = True
if(islist==True):
for c in(json_normalize(new[column].sample(1).iloc[0])):
#new = new.join(','.join({v for x in new[column] for y in x for k,v in y.items() if k==c}))
new = new.join(new[column].explode().apply(pd.Series).groupby(level=0)[[c]].agg(','.join))
#print(column)
else:
new = new.join(json_normalize(new[column]))
new = new.drop(column,axis=1)
new = new.add_prefix(column + '_')
#data_df=data_df.join(new, on='Id',how='left')
data_df=pd.merge(data_df, new, how='left', left_on='Id', right_on=column + '_Id' )
data_df = data_df.drop(column, 1)
data_df = data_df.drop(column + '_Id', 1)
Try with explode
out = df.join(df['Pets'].explode().apply(pd.Series).groupby(level=0)[['Name']].agg(','.join))