I am not very advanced when it comes to writing SQL queries, and I am trying to figure out a query which will display the rating on a product page from a user who has the most comparable taste in products to the user visiting the page. I have provided a scenario below which I hope makes it easy to understand what I am trying to achieve.
Example MySQL database
row_id | user_id | rating | product_id |
---|---|---|---|
1 | 1 | 41 | 1 |
2 | 2 | 39 | 1 |
3 | 3 | 26 | 1 |
4 | 4 | 56 | 1 |
5 | 1 | 67 | 2 |
6 | 2 | 66 | 2 |
7 | 3 | 72 | 2 |
8 | 4 | 82 | 2 |
9 | 5 | 64 | 2 |
10 | 2 | 55 | 3 |
11 | 3 | 47 | 3 |
12 | 4 | 70 | 3 |
Scenario
The person visiting the page is user_id #1 and they are viewing product_id #3.
The goal
I am trying to figure out the cleanest and most efficient way to display the rating given for product_id #3 from whichever user who has the most identical ratings to user_id #1 based on all of the other products which both users have submitted a rating.
I would like this to also ignore any users who have NOT submitted at least 1 product rating between 1-50 AND at least 1 product rating between 51-100 (so basically I want to exclude users who just always give high OR always give low ratings for everything).
If there are no users who have submitted ratings based on the criteria above, then it would need to simply return “n/a” instead.
The process of elimination
The row with user_id #5 would be excluded because this user has not rated product_id #3.
The rows for user_id #4 would also be excluded because this user has not submitted any ratings between 1-50.
This leaves user #2 and user #3 remaining as they have both met the criteria of having submitted a rating to a product between 1-50 AND also a product between 51-100.
Product #1 was rated 41 by user #1, rated 39 by user #2 (a difference of 2) and rated 26 by user #3 (a difference of 15).
Product #2 was rated 67 by user #1, rated 66 by user #2 (a difference of 1) and rated 72 by user #3 (a difference of 5).
Putting this into an average, the user #2 has an average difference of 1.5 compared to the ratings user #1 gave, while user #3 has an average difference of 10 compared to the ratings user #1 gave for the same products.
The result
Based on the example database table, user #2 would have the most identical ratings to user #1 and the expected output for product #3 should be: 55
Any help with figuring out how to make this work would be much appreciated.
I created a fiddle that does the necessary operations: http://sqlfiddle.com/#!9/e1caa2/1
Below is the full sql query. It is structured as follows:
product_id
to get the different ratings between usersSQL Query:
SELECT rating
FROM table1 t1
INNER JOIN
(SELECT user_id,
avg(difference) AS avg_difference
FROM
(SELECT t2.user_id,
abs(t1.rating-t2.rating) AS difference
FROM table1 t1
RIGHT JOIN table1 t2 ON t1.product_id = t2.product_id
WHERE t1.user_id = '1'
AND t2.user_id in
(SELECT user_with_appropriate_rating.user_id
FROM
(SELECT user_id
FROM
(SELECT user_id,
max(rating) AS maxrating,
min(rating) AS minrating
FROM table1
GROUP BY user_id) user_with_rating
WHERE user_with_rating.maxrating >50
AND user_with_rating.minrating<50) user_with_appropriate_rating
INNER JOIN
(SELECT DISTINCT user_id
FROM table1
WHERE product_id = '3') user_has_reviewed_product ON user_with_appropriate_rating.user_id = user_has_reviewed_product.user_id)) product_difference
GROUP BY user_id) t2 ON t1.user_id = t2.user_id
WHERE t1.product_id='3'
ORDER BY avg_difference ASC
LIMIT 1
For the sake of completeness, the schema used in the sql fiddle:
CREATE TABLE IF NOT EXISTS `table1` (
`row_id` int(1) NOT NULL,
`user_id` int(1) NOT NULL,
`rating` int(1) NOT NULL,
`product_id` int(1) NOT NULL,
PRIMARY KEY (`row_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `table1` (`row_id`, `user_id`, `rating`, `product_id`) VALUES
('1', '1', '41', '1'),
('2', '2', '39', '1'),
('3', '3', '26', '1'),
('4', '4', '56', '1'),
('5', '1', '67', '2'),
('6', '2', '66', '2'),
('7', '3', '72', '2'),
('8', '4', '82', '2'),
('9', '5', '64', '2'),
('10', '2', '55', '3'),
('11', '3', '47', '3'),
('12', '4', '70', '3');