I have a dataframe like below (available in array format or unnest one):
team | player | favorite_food
A | A_player1 | [pizza, sushi]
A | A_player2 | [salad, sushi]
B | B_player1 | [pizza, pasta, salad, taco]
B | B_player2 | [taco, salad, sushi]
B | B_player3 | [taco]
I want to get number and percentage of food players have in common per team. Something like below:
team | #_food_common | percent_food_common
A | 1 | 0.33
B | 1 | 0.2
What is a good way to do this in Python preferably Pandas?
You can use set
operations and groupby.agg
:
(df['favorite_food'].apply(set)
.groupby(df['team'])
.agg(**{'#_food_common': lambda x: len(set.intersection(*x)),
'percent_food_common': lambda x: len(set.intersection(*x))/len(set.union(*x)),
})
.reset_index()
)
Output:
team #_food_common percent_food_common
0 A 1 0.333333
1 B 1 0.200000
Used input:
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B'],
'player': ['A_player1', 'A_player2', 'B_player1', 'B_player2', 'B_player3'],
'favorite_food': [['pizza', 'sushi'],
['salad', 'sushi'],
['pizza', 'pasta', 'salad', 'taco'],
['taco', 'salad', 'sushi'],
['taco']]})