python-3.x pandas data-science recommendation-engine

Rule-based recommendation system

I have a data frame that contains the names of 3 friends and 10 restaurants that are ranked 1-10 (where Rank 1 indicates most likely to be interested, while rank 10 means least likely to be interested) as InterestRank for each friend. The Data frame contains attributes of restaurants too like Cost, Cuisine and Alcohol served or not. The Data frame looks like following:

FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No

I want to recommend the top 4 restaurants to each friend according to their InterestRank as well as a condition that no more than 2 restaurants with the same cuisine type will be recommended to each of them. How to achieve this in a Pythonic way?

Edit: Expected output data frame

I want the final output to be something like this:

FriendName	Restaurant	RecommendationRank
Amy	R2	1
Amy	R1	2
Amy	R10	3
Amy	R7	4
Ben	R7	1
Ben	R2	2
Ben	R5	3
Ben	R8	4
Cathy	R5	1
Cathy	R8	2
Cathy	R6	3
Cathy	R9	4

Solution

We can use sort_values and groupby to achieve these type of window functions in a pandas.DataFrame.

from io import StringIO
import pandas as pd

input_data = """
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No
""".strip()

# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))

# Use sorting and grouping, along with `head`,
# to achieve the desired window functions
result = (
    df
    # Sort `(friend, cuisine)` group by interest rank and take the top 2
    .sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
    .groupby(['FriendName', 'Cuisine'])
    .head(2)
    # Sort `friend` group by interest rank and take the top 4
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(4)
    # Reset index, which was just "scrambled" from the sorting and slicing
    .reset_index(drop=True)
)

print(result)

The result:

   FriendName Restaurant  InterestRank   Cuisine Cost Alcohol
0         Amy         R2             1    French   $$      No
1         Amy         R1             2    French  $$$     Yes
2         Amy        R10             5   Mexican  $$$     Yes
3         Amy         R7             6  Japanese   $$     Yes
4         Ben         R7             1  Japanese   $$     Yes
5         Ben         R5             2  Japanese   $$      No
6         Ben         R2             3    French   $$      No
7         Ben         R8             4   Mexican   $$      No
8       Cathy         R5             1  Japanese   $$      No
9       Cathy         R8             2   Mexican   $$      No
10      Cathy         R6             3  Japanese    $      No
11      Cathy         R9             4   Mexican   $$      No

Edit: solution to additional request in comments

What if we want to add 2 conditions instead? So like no more than 2 restaurants with the same cuisine type and also no more than 2 "No"s in Alcohol will be recommended to each of them.

# Read data from CSV-formatted string input
df = pd.read_csv(StringIO(input_data))

# Take top 2 "no alcohol" restaurants per friend
no_df = (
    df[df.Alcohol == 'No']
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(2)
)
# Take top 4 alcoholic restaurants per friend
# (we don't mind if ultimately all 4 are alcohol restaurants 
# in the final result, as there is no restriction on these)
yes_df = (
    df[df.Alcohol == 'Yes']
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(4)
)
# Concatenate and then proceed as before
result = (
    pd.concat([no_df, yes_df], axis=0)
    # Sort `(friend, cuisine)` group by interest rank and take the top 2
    .sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
    .groupby(['FriendName', 'Cuisine'])
    .head(2)
    # Sort `friend` group by interest rank and take the top 4
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(4)
    # Reset index, which was just "scrambled" from the sorting and slicing
    .reset_index(drop=True)
)

print(result)

The result:

   FriendName Restaurant  InterestRank   Cuisine Cost Alcohol
0         Amy         R2             1    French   $$      No
1         Amy         R1             2    French  $$$     Yes
2         Amy        R10             5   Mexican  $$$     Yes
3         Amy         R6             7  Japanese    $      No
4         Ben         R7             1  Japanese   $$     Yes
5         Ben         R5             2  Japanese   $$      No
6         Ben         R2             3    French   $$      No
7         Ben         R4             5    French  $$$     Yes
8       Cathy         R5             1  Japanese   $$      No
9       Cathy         R8             2   Mexican   $$      No
10      Cathy         R1             5    French  $$$     Yes
11      Cathy         R3             6    French   $$     Yes