pythonmachine-learningrecommendation-enginematrix-factorization

Evaluating the LightFM Recommendation Model


I've been playing around with lightfm for quite some time and found it really useful to generate recommendations. However, there are two main questions that I would like to know.

  1. to evaluate the LightFM model in case where the rank of the recommendations matter, should I rely more on precision@k or other provided evaluation metrics such as AUC score? in what cases should I focus on improving my precision@k compared to other metrics? or maybe are they highly correlated? which means if I manage to improve my precision@k score, the other metrics would follow, am I correct?

  2. how would you interpret if a model that trained using WARP loss function has a score 0.089 for precision@5 ? AFAIK, Precision at 5 tells me what proportion of the top 5 results are positives/relevant. which means I would get 0 precision@5 if my predictions could not make it to top 5 or I will get 0.2 if I got only one predictions correct in the top 5. But I cannot interpret what 0.0xx means for precision@n

Thanks


Solution

  • Precision@K and AUC measure different things, and give you different perspectives on the quality of your model. In general, they should be correlated, but understanding how they differ may help you choose the one that is more important for your application.

    Note also that while the maximum value of the AUC metric is 1.0, the maximum achievable precision@K is dependent on your data. For example, if you measure precision@5 but there is only one positive item, the maximum score you can achieve is 0.2.

    In LightFM, the AUC and precision@K routines return arrays of metric scores: one for every user in your test data. Most likely, you average these to get a mean AUC or mean precision@K score: if some of your users have score 0 on the precision@5 metric, it is possible that your average precision@5 will be between 0 and 0.2.

    Hope this helps!