databaserecommendation-enginecollective-intelligence

How to create my own recommendation engine?


I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading "Programming Collective Intelligence" I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind.

  1. Where do (should be implemented on database side or backend side) I start creating recommendation engine?
  2. What level of database knowledge will be needed?
  3. Is there any open source ones that can be used for help or any resource?
  4. What should be the first steps that I have to do?

Solution

  • I've built up one for a video portal myself. The main idea that I had was about collecting data about everything:

    Next I created functions which return lists of (id,weight) tuples for each of the above points. Some only consider a limited amount of videos (eg last 50), some modify the weight by eg rating, tag count (more often tagged = less expressive). There are functions that return the following lists:

    All these will be combined into a single list by just summing up the weights by video ids, then sorted by weight. This works pretty well for around 1000 videos now. But you need to do background processing or extreme caching for this to be speedy.

    I'm hoping that I can reduce this to a generic recommendation engine or similarity calculator soon and release as a rails/activerecord plugin. Currently it's still a well integrated part of my project.

    To give a small hint, in ruby code it looks like this:

    def related_by_tags
      tag_names.find(:all, :include => :videos).inject([]) { |result,t|
        result + t.video_ids.map { |v|
          [v, TAG_WEIGHT / (0.1 + Math.log(t.video_ids.length) / Math.log(2))]
        }
      }
    end
    

    I would be interested on how other people solve such algorithms.