I have a table of movies and a table of reviews
In my app, I want to show top 10 movies of any genre.
I clearly cannot sort movies just by rating since there are movies with only 1 5 star review, thus only irrelevant movies will be recommended to users.
Currently I receive from DB top 100 movies in this genre sorted by reviews, sort this list by rating on server and only then display top 10.
That kinda works but this solution is impractical in case of e.g. review bombing and moreover, the purpose of top 10 list is to recommend the most relevant movies.
My idea was to add relevance column into the movies table but I've got no clue how to count it:
Moreover, the rating in ratings is not a real number. User can give only 5, 4.5, 4 etc. score review. But what about the situation where users can rate movies like 5, 4.9, 4.8 ... 0.1?
So, how to perform this operation in better way?
[Upd] I think instead of division of smth. we should multiply averagerating
and reviews
from movies
in order to count the relevance (averagerating
and reviews
are alredy automatically updated on each insert/delete/update). Also we should try to normalize the product.
In this situation movies with 100 reviews of 5 and averagerating
of 5 won't beat up movies with averagerating
3.8 but with 57k reviews and also the problem of review bombing will be solved.
Can anyone prove my guess?
I agree with @NickW that this is more of a statistics question than a programming question, but I'll try to answer it, anyway.
If you want to account for both average rating and number of ratings, a straightforward method is to multiply the two. This gives you the sum of all ratings, but, as @qwezxc789 notes, this does not account for the number of zero ratings. Another strategy could be a linear combination of avgrating
and reviews
. Collinearity shouldn't be an issue because neither variable depends on the other. You could even play around with the linear coefficients to change the relative contribution of each variable. This solution easily generalizes to n independent variables.
Let wi be the weight of predictor i, 1 ≤ i ≤ n, w1 + ... + wn = 1 (or any other constant, but why not use 1?).
You can add this value as a new relevance
column in the movies
table using the following SQL. I use two equally weighted predictors: ratings
and reviews
.
ALTER TABLE movies ADD [relevance] AS
(SELECT 0.5 * avgrating + 0.5 * reviews
FROM movies)