python-3.xpandasdata-sciencerecommendation-engine

Rule-based recommendation system


I have a data frame that contains the names of 3 friends and 10 restaurants that are ranked 1-10 (where Rank 1 indicates most likely to be interested, while rank 10 means least likely to be interested) as InterestRank for each friend. The Data frame contains attributes of restaurants too like Cost, Cuisine and Alcohol served or not. The Data frame looks like following:

FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,$$,No
Ben,R2,3,French,$$,No
Cathy,R2,8,French,$$,No
Amy,R1,2,French,$$$,Yes
Ben,R1,9,French,$$$,Yes
Cathy,R1,5,French,$$$,Yes
Amy,R4,3,French,$$$,Yes
Ben,R4,5,French,$$$,Yes
Cathy,R4,10,French,$$$,Yes
Amy,R3,4,French,$$,Yes
Ben,R3,10,French,$$,Yes
Cathy,R3,6,French,$$,Yes
Amy,R10,5,Mexican,$$$,Yes
Ben,R10,6,Mexican,$$$,Yes
Cathy,R10,7,Mexican,$$$,Yes
Amy,R7,6,Japanese,$$,Yes
Ben,R7,1,Japanese,$$,Yes
Cathy,R7,9,Japanese,$$,Yes
Amy,R6,7,Japanese,$,No
Ben,R6,8,Japanese,$,No
Cathy,R6,3,Japanese,$,No
Amy,R8,8,Mexican,$$,No
Ben,R8,4,Mexican,$$,No
Cathy,R8,2,Mexican,$$,No
Amy,R5,9,Japanese,$$,No
Ben,R5,2,Japanese,$$,No
Cathy,R5,1,Japanese,$$,No
Amy,R9,10,Mexican,$$,No
Ben,R9,7,Mexican,$$,No
Cathy,R9,4,Mexican,$$,No

I want to recommend the top 4 restaurants to each friend according to their InterestRank as well as a condition that no more than 2 restaurants with the same cuisine type will be recommended to each of them. How to achieve this in a Pythonic way?

Edit: Expected output data frame

I want the final output to be something like this:

FriendName Restaurant RecommendationRank
Amy R2 1
Amy R1 2
Amy R10 3
Amy R7 4
Ben R7 1
Ben R2 2
Ben R5 3
Ben R8 4
Cathy R5 1
Cathy R8 2
Cathy R6 3
Cathy R9 4

Solution

  • Solution

    We can use sort_values and groupby to achieve these type of window functions in a pandas.DataFrame.

    from io import StringIO
    import pandas as pd
    
    input_data = """
    FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
    Amy,R2,1,French,$$,No
    Ben,R2,3,French,$$,No
    Cathy,R2,8,French,$$,No
    Amy,R1,2,French,$$$,Yes
    Ben,R1,9,French,$$$,Yes
    Cathy,R1,5,French,$$$,Yes
    Amy,R4,3,French,$$$,Yes
    Ben,R4,5,French,$$$,Yes
    Cathy,R4,10,French,$$$,Yes
    Amy,R3,4,French,$$,Yes
    Ben,R3,10,French,$$,Yes
    Cathy,R3,6,French,$$,Yes
    Amy,R10,5,Mexican,$$$,Yes
    Ben,R10,6,Mexican,$$$,Yes
    Cathy,R10,7,Mexican,$$$,Yes
    Amy,R7,6,Japanese,$$,Yes
    Ben,R7,1,Japanese,$$,Yes
    Cathy,R7,9,Japanese,$$,Yes
    Amy,R6,7,Japanese,$,No
    Ben,R6,8,Japanese,$,No
    Cathy,R6,3,Japanese,$,No
    Amy,R8,8,Mexican,$$,No
    Ben,R8,4,Mexican,$$,No
    Cathy,R8,2,Mexican,$$,No
    Amy,R5,9,Japanese,$$,No
    Ben,R5,2,Japanese,$$,No
    Cathy,R5,1,Japanese,$$,No
    Amy,R9,10,Mexican,$$,No
    Ben,R9,7,Mexican,$$,No
    Cathy,R9,4,Mexican,$$,No
    """.strip()
    
    # Read data from CSV-formatted string input
    df = pd.read_csv(StringIO(input_data))
    
    # Use sorting and grouping, along with `head`,
    # to achieve the desired window functions
    result = (
        df
        # Sort `(friend, cuisine)` group by interest rank and take the top 2
        .sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
        .groupby(['FriendName', 'Cuisine'])
        .head(2)
        # Sort `friend` group by interest rank and take the top 4
        .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
        .groupby(['FriendName'])
        .head(4)
        # Reset index, which was just "scrambled" from the sorting and slicing
        .reset_index(drop=True)
    )
    
    print(result)
    

    The result:

       FriendName Restaurant  InterestRank   Cuisine Cost Alcohol
    0         Amy         R2             1    French   $$      No
    1         Amy         R1             2    French  $$$     Yes
    2         Amy        R10             5   Mexican  $$$     Yes
    3         Amy         R7             6  Japanese   $$     Yes
    4         Ben         R7             1  Japanese   $$     Yes
    5         Ben         R5             2  Japanese   $$      No
    6         Ben         R2             3    French   $$      No
    7         Ben         R8             4   Mexican   $$      No
    8       Cathy         R5             1  Japanese   $$      No
    9       Cathy         R8             2   Mexican   $$      No
    10      Cathy         R6             3  Japanese    $      No
    11      Cathy         R9             4   Mexican   $$      No
    

    Edit: solution to additional request in comments

    What if we want to add 2 conditions instead? So like no more than 2 restaurants with the same cuisine type and also no more than 2 "No"s in Alcohol will be recommended to each of them.

    # Read data from CSV-formatted string input
    df = pd.read_csv(StringIO(input_data))
    
    # Take top 2 "no alcohol" restaurants per friend
    no_df = (
        df[df.Alcohol == 'No']
        .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
        .groupby(['FriendName'])
        .head(2)
    )
    # Take top 4 alcoholic restaurants per friend
    # (we don't mind if ultimately all 4 are alcohol restaurants 
    # in the final result, as there is no restriction on these)
    yes_df = (
        df[df.Alcohol == 'Yes']
        .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
        .groupby(['FriendName'])
        .head(4)
    )
    # Concatenate and then proceed as before
    result = (
        pd.concat([no_df, yes_df], axis=0)
        # Sort `(friend, cuisine)` group by interest rank and take the top 2
        .sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
        .groupby(['FriendName', 'Cuisine'])
        .head(2)
        # Sort `friend` group by interest rank and take the top 4
        .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
        .groupby(['FriendName'])
        .head(4)
        # Reset index, which was just "scrambled" from the sorting and slicing
        .reset_index(drop=True)
    )
    
    print(result)
    

    The result:

       FriendName Restaurant  InterestRank   Cuisine Cost Alcohol
    0         Amy         R2             1    French   $$      No
    1         Amy         R1             2    French  $$$     Yes
    2         Amy        R10             5   Mexican  $$$     Yes
    3         Amy         R6             7  Japanese    $      No
    4         Ben         R7             1  Japanese   $$     Yes
    5         Ben         R5             2  Japanese   $$      No
    6         Ben         R2             3    French   $$      No
    7         Ben         R4             5    French  $$$     Yes
    8       Cathy         R5             1  Japanese   $$      No
    9       Cathy         R8             2   Mexican   $$      No
    10      Cathy         R1             5    French  $$$     Yes
    11      Cathy         R3             6    French   $$     Yes