I have a data frame that contains the names of 3 friends and 10 restaurants that are ranked 1-10 (where Rank 1 indicates most likely to be interested, while rank 10 means least likely to be interested) as InterestRank for each friend. The Data frame contains attributes of restaurants too like Cost, Cuisine and Alcohol served or not. The Data frame looks like following:
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No
I want to recommend the top 4 restaurants to each friend according to their InterestRank as well as a condition that no more than 2 restaurants with the same cuisine type will be recommended to each of them. How to achieve this in a Pythonic way?
Edit: Expected output data frame
I want the final output to be something like this:
FriendName | Restaurant | RecommendationRank |
---|---|---|
Amy | R2 | 1 |
Amy | R1 | 2 |
Amy | R10 | 3 |
Amy | R7 | 4 |
Ben | R7 | 1 |
Ben | R2 | 2 |
Ben | R5 | 3 |
Ben | R8 | 4 |
Cathy | R5 | 1 |
Cathy | R8 | 2 |
Cathy | R6 | 3 |
Cathy | R9 | 4 |
We can use sort_values
and groupby
to achieve these type of window functions in a pandas.DataFrame
.
from io import StringIO
import pandas as pd
input_data = """
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No
""".strip()
# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))
# Use sorting and grouping, along with `head`,
# to achieve the desired window functions
result = (
df
# Sort `(friend, cuisine)` group by interest rank and take the top 2
.sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
.groupby(['FriendName', 'Cuisine'])
.head(2)
# Sort `friend` group by interest rank and take the top 4
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(4)
# Reset index, which was just "scrambled" from the sorting and slicing
.reset_index(drop=True)
)
print(result)
The result:
FriendName Restaurant InterestRank Cuisine Cost Alcohol
0 Amy R2 1 French $$ No
1 Amy R1 2 French $$$ Yes
2 Amy R10 5 Mexican $$$ Yes
3 Amy R7 6 Japanese $$ Yes
4 Ben R7 1 Japanese $$ Yes
5 Ben R5 2 Japanese $$ No
6 Ben R2 3 French $$ No
7 Ben R8 4 Mexican $$ No
8 Cathy R5 1 Japanese $$ No
9 Cathy R8 2 Mexican $$ No
10 Cathy R6 3 Japanese $ No
11 Cathy R9 4 Mexican $$ No
What if we want to add 2 conditions instead? So like no more than 2 restaurants with the same cuisine type and also no more than 2 "No"s in Alcohol will be recommended to each of them.
# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))
# Take top 2 "no alcohol" restaurants per friend
no_df = (
df[df.Alcohol == 'No']
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(2)
)
# Take top 4 alcoholic restaurants per friend
# (we don't mind if ultimately all 4 are alcohol restaurants
# in the final result, as there is no restriction on these)
yes_df = (
df[df.Alcohol == 'Yes']
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(4)
)
# Concatenate and then proceed as before
result = (
pd.concat([no_df, yes_df], axis=0)
# Sort `(friend, cuisine)` group by interest rank and take the top 2
.sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
.groupby(['FriendName', 'Cuisine'])
.head(2)
# Sort `friend` group by interest rank and take the top 4
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(4)
# Reset index, which was just "scrambled" from the sorting and slicing
.reset_index(drop=True)
)
print(result)
The result:
FriendName Restaurant InterestRank Cuisine Cost Alcohol
0 Amy R2 1 French $$ No
1 Amy R1 2 French $$$ Yes
2 Amy R10 5 Mexican $$$ Yes
3 Amy R6 7 Japanese $ No
4 Ben R7 1 Japanese $$ Yes
5 Ben R5 2 Japanese $$ No
6 Ben R2 3 French $$ No
7 Ben R4 5 French $$$ Yes
8 Cathy R5 1 Japanese $$ No
9 Cathy R8 2 Mexican $$ No
10 Cathy R1 5 French $$$ Yes
11 Cathy R3 6 French $$ Yes