apache-sparkmachine-learningapache-spark-mllibcollaborative-filtering

Apache Spark ALS Recommendation Rating values higher than range


I've ran a little ALS recommender system program as found on the Apache Spark website which utilises MLlib. When using a dataset with ratings of 1-5 (I've used the MovieLens dataset) it gives recommendations with predicted ratings of over 5 !

The highest I've found in my small testing is 7.4. Obviously, I am either misunderstanding what the code is meant to do, or something has gone awry. I have researched into Latent Factor Recommender Systems and was under the impression that the Spark Mlib ALS implementation was based on this one.

Why would it return ratings higher than what is possible? It makes no sense.

Have I misunderstood the algorithm or is the program flawed?


Solution

  • You're looking at the right paper, but, I think you are expecting the algorithm to do something it is not intended to do. It is producing a low-rank approximation to your input as the product of two matrices, but nothing about multiplying matrices clamps the output values.

    You can clamp, or round the values. You may not want it to because you're getting extra info about how much stronger than 5 the predicted rating is. I suppose it's also not technically possible for the algorithm to assume that the maximum possible value is the max observed value in the input.