algorithmtheorydata-miningrecommendation-enginecollaborative-filtering

Recommendation algorithm (and implementation) for finding similar items and users


I have a database of about 700k users along with items they have watched/listened to/read/bought/etc. I would like to build a recommendation engine that recommends new items based on what users with similar taste in things have enjoyed, as well as actually finding people the user might want to be friends with on a social network I'm building (similar to last.fm).

My requirements are as follows:

Please don't give an answer like "use pysuggest or mahout", since those implement a plethora of algorithms and I'm looking for one that's most suitable for my data/use. I've been interested in Neo4j and how it all could be expressed as a graph of connections between users and items.


Solution

  • Actually that is one of the sweetspots of a graph database like Neo4j.

    So if your data model looks like this:

    user -[:LIKE|:BOUGHT]-> item
    

    You can easily get recommendations for an user with a cypher statement like this:

    start user = node:users(id="doctorkohaku")
    match user -[r:LIKE]->item<-[r2:LIKE]-other-[r3:LIKE]->rec_item
    where r.stars > 2 and r2.stars > 2 and r3.stars > 2
    return rec_item.name, count(*) as cnt, avg(r3.stars) as rating
    order by rating desc, cnt desc limit 10
    

    This can also be done using the Neo4j Core-API or the Traversal-API.

    Neo4j has an Python API that is also able to run cypher queries.

    Disclaimer: I work for Neo4j

    There are also some interesting articles by Marko Rodriguez about collaborative filtering.