sqlvectorrecommendation-enginecollaborative-filtering

SQL counting likes-dislikes for recommendation system, collaborative filtering User-Based


The idea is that the user leaves likes-dislikes for different items, I need to get a list of users who have left the same ratings (likes and dislikes) as the selected user (USER_ID = 1), to determine their similarity.

RATING Column:
1 = like,
0 = dislike

Full table:

+---------+---------+--------+--------------------------------------------------+
| USER_ID | ITEM_ID | RATING |                      -EXAMPLE-                   |
+---------+---------+--------+--------------------------------------------------+
|       1 |       1 |      1 |-+
|       1 |       2 |      1 | |
|       1 |       3 |      1 | +-[1,1,1,0,0] user_1 vector of ratings
|       1 |       4 |      0 | |  |     | | 
|       1 |       5 |      0 |-+  |     | |     
|       3 |       1 |      1 |----+     + + total_match with user_1 = 3 [1,0,0]
|       3 |       2 |      0 |          | |        
|       3 |       3 |      0 |          | |       
|       3 |       4 |      0 |----------+ |
|       3 |       5 |      0 |------------+
|       4 |       1 |      1 |
|       4 |       2 |      1 |
|       4 |       3 |      1 |
|       4 |       4 |      0 |
|       4 |       5 |      0 |
+---------+---------+--------+

Match computing:

user_3 likes_match with user_1 = 1
user_3 dislikes_match with user_1 = 2
total_match = likes_match + dislikes_match = 3

How to make a SQL query to get the following result:

+---------+-------------+----------------+-------------+
| user_id | likes_match | dislikes_match | total_match |
+---------+-------------+----------------+-------------+
|       3 |           1 |              2 |           3 |
|       4 |           3 |              2 |           5 |
+---------+-------------+----------------+-------------+

Any Ideas?


Solution

  • (This uses sqlite, but it shouldn't need much if anything to work on other databases):

    Given the following table:

    CREATE TABLE ratings(user_id INTEGER, item_id INTEGER, rating INTEGER
                       , PRIMARY KEY(user_id, item_id)) WITHOUT ROWID;
    INSERT INTO ratings VALUES(1,1,1);
    INSERT INTO ratings VALUES(1,2,1);
    INSERT INTO ratings VALUES(1,3,1);
    INSERT INTO ratings VALUES(1,4,0);
    INSERT INTO ratings VALUES(1,5,0);
    INSERT INTO ratings VALUES(3,1,1);
    INSERT INTO ratings VALUES(3,2,0);
    INSERT INTO ratings VALUES(3,3,0);
    INSERT INTO ratings VALUES(3,4,0);
    INSERT INTO ratings VALUES(3,5,0);
    INSERT INTO ratings VALUES(4,1,1);
    INSERT INTO ratings VALUES(4,2,1);
    INSERT INTO ratings VALUES(4,3,1);
    INSERT INTO ratings VALUES(4,4,0);
    INSERT INTO ratings VALUES(4,5,0);
    

    This query:

    SELECT r1.user_id AS user_id
         , sum(r1.rating) AS likes_match
         , sum(CASE r1.rating WHEN 0 THEN 1 ELSE 0 END) AS dislikes_match
         , count(*) AS total_match
    FROM ratings AS r1
    JOIN ratings AS r2 ON r2.user_id = 1
                      AND r1.item_id = r2.item_id
                      AND r1.rating = r2.rating
    WHERE r1.user_id <> 1
    GROUP BY r1.user_id
    ORDER BY r1.user_id;
    

    Prouduces:

    user_id     likes_match  dislikes_match  total_match
    ----------  -----------  --------------  -----------
    3           1            2               3          
    4           3            2               5