I am working on a personalised news recommendation engine based on click-behaviour of users. My features will be predefined news categories (such as politics, sport and etc).
Whenever user clicks on an article, I build/update user profile based on this article, then recommend another article from articles pool.
Regarding evaluation of this system, I need to have a dataset which contains binary user-item interactions (user clicked on recommended article or not) - which I couldn't find an appropriate dataset for this specific context. What I'm trying to do is, binarize Movielens dataset, then calculate precision and recall.
What I actually do in MovieLens dataset is as follows: if the rating for an item, by a user, is larger than the average rating by this user I assign it a binary rating of 1, 0 otherwise.
Is this approach right way to evaluate such kind of systems?
BTW there is already a recommender in open source that does this, and allows mixing multiple events/actions/indicators and can also use content similarity here. It is based on PredictionIO's framework, which is Spark based.