This question is probably very repeated in the blogging and Q&A websites but I couldn't find any concrete answer yet.
I am trying to build a recommendation system for customers using only their purchase history.
If am right we call this as content-based recommendation as opposed to collaborative filtering since it doesn't involve co-occurrence of items or user preferences to an item.
My problem is multi-fold:
Note: I looked at Mahout (since am familiar with Java and Mahout apparently utilizes Hadoop for distributed processing) for doing this in scale and advantage of having a well tested ML algorithms.
Your help is appreciated. Any examples would be really great. Thanks.
The so called item-item recommenders are natural candidates for precomputing the similarities, because the attributes of the items rarely change. I would suggest you precompute the item similarity between each item, and perhaps store the top K for each item, and if you have enough resources you could load the similarity matix into main memory for real time recommendation.
Check out my answer to this question for a way to do this in Mahout: Does Mahout provide a way to determine similarity between content (for content-based recommendations)?
The example is how to compute the textual similarity between the items, and than load the precomputed values into main memory.
For performance comparison about different data structures to hold the values check out this question: Mahout precomputed Item-item similarity - slow recommendation