graphgremlinrecommendation-engineamazon-neptune

How to generate recommendations for a User using Gremlin?


I am using gremlin QL on AWS Neptune Database to generate Recommendations for a user to try new food items. The problem that I am facing is that the recommendations need to be in the same cuisine as the user likes. We are given with three different types of nodes which are- "User", "the cuisine he likes" and "the category of the cuisine" that it lies in. enter image description here

In the picture above, the recommendations for "User 2" would be "Node 1" and "Node 2". However "Node 1" belongs to a different category which is why we cannot recommend that node to "User2". We can only recommend "Node 2" to the user since that is the only node that belongs to the same category as the user likes. How do I write a gremlin query to achieve the same?

Note- There are multiple nodes for a user and multiple categories that these nodes belong to.


Solution

  • Here's a sample dataset that we can use:

    g.addV('user').property('name','ben').as('b')
      .addV('user').property('name','sally').as('s')
      .addV('food').property('foodname','chicken marsala').as('fvm')
      .addV('food').property('foodname','shrimp diavolo').as('fsd')
      .addV('food').property('foodname','kung pao chicken').as('fkpc')
      .addV('food').property('foodname','mongolian beef').as('fmb')
      .addV('cuisine').property('type','italian').as('ci')
      .addV('cuisine').property('type','chinese').as('cc')
      .addE('hasCuisine').from('fvm').to('ci')
      .addE('hasCuisine').from('fsd').to('ci')
      .addE('hasCuisine').from('fkpc').to('cc')
      .addE('hasCuisine').from('fmb').to('cc')
      .addE('eats').from('b').to('fvm')
      .addE('eats').from('b').to('fsd')
      .addE('eats').from('b').to('fkpc')
      .addE('eats').from('b').to('fmb')
      .addE('eats').from('s').to('fmb')
    

    Let's start with the user Sally...

    g.V().has('name','sally').
    

    Then we want to find all food item nodes that Sally likes.

    (Note: It is best to add edge labels to your edges here to help with navigation.)

    Let's call the edge from a user to a food item, "eats". Let's also assume that the direction of the edge (they must have a direction) goes from a user to a food item. So let's traverse to all foods that they like. We'll save this to a temporary list called 'liked' that we'll use later in the query to filter out the foods that Sally already likes.

    .out('eats').aggregate('liked').
    

    From this point in the graph, we need to diverge and fetch two downstream pieces of data. First, we want to go fetch the cuisines related to food items that Sally likes. We want to "hold our place" in the graph while we go fetch these items, so we use the sideEffect() step which allows us to go do something but come back to where we currently are in the graph to continue our traversal.

        sideEffect(
            out('hasCuisine').
            dedup().
            aggregate('cuisineschosen')).
    

    Inside of the sideEffect() we want to traverse from food items to cuisines, deduplicate the list of related cuisines, and save the list of cuisines in a temporary list called 'cuisinechosen'.

    Once we fetch the cuisines, we'll come back to where we were previously at the food items. We now want to go find the related users to Sally based on common food items. We also want to make sure we're not traversing back to Sally, so we'll use simplePath() here. simplePath() tells the query to ignore cycles.

    in('eats').
        simplePath().
    

    From here we want to find all food items that our related users like and only return the ones with a cuisine that Sally already likes. We also remove the foods that Sally already likes.

    out('eats').
        where(without('liked')).
        where(
            out('hasCuisine').
            where(
                within('cuisineschosen'))).
      values('foodname')
    

    NOTE: You may also want to add a dedup() here after out('eats') to only return a distinct list of food items.

    Putting it altogether...

    g.V().has('name','sally').
      out('eats').aggregate('liked').
        sideEffect(
            out('hasCuisine').
            dedup().
            aggregate('cuisineschosen')).
        in('eats').
        simplePath().
      out('eats').
        where(without('liked')).
        where(
            out('hasCuisine').
            where(
                within('cuisineschosen'))).
      values('foodname')
    

    Results:

    ['kung pao chicken']
    

    At scale, you may need to use the sample() or coin() steps in Gremlin when finding related users as this can fan out really fast. Query performance is going to be based on how many objects each query needs to traverse.