pythonalgorithmmachine-learningrecommendation-enginecollaborative-filtering

How to use collaborative filtering to predict users by their behavior?


It seems like a very strange thing to do, but how to predict a certain user by his/her recommendations?

EXAMPLE:

Perhaps, we were making an experiment on how users movie ratings change in time.

And we made a survey in the first year (2019) and in the second (2020) with same users and movies.

But our database has crashed, so we have only users reccomendations in the second year, but we don't know whose recommendations are given.

We have: Users-Recommendations matrix in first year, like this one:

Users-Reccomendation matrix 1

And Users-Recommendations matrix in the second year, but without 'target' column:

enter image description here


So, by the first year data, we want to recover/predict the 'target' column.

Usually, collaborative filtering is used to predict movie ratings, but this is reversed problem.

Can collaborative filtering still be used?

If yes, what kind of interactions should be made to algorithm in order to solve the problem?

Or is there another approach or ML-algorithm to do this?

Thank you!

P.S.: if you'll attach Python example on similar task, it would be great!

And sorry for not attaching real data and task to this post, I literally can't do this by many reasons :)


Solution

  • There are probably several ways to approach this problem, but I would treat this as an assignment problem. You basically want to assign 2020 movie ratings to users. To do this you have to:

    1. Define the cost of assigning movie ratings to a user. This could be done by defining a distance function between 2020 movie ratings and 2019 movie ratings. An example could be L2-Norm (euclidean distance). So assigning 2020 movie ratings 0 ([1, 2, 4, 4, 6]) to user id_0 (2019 ratings = [1, 1, 3, 5, 6]) would cost sqrt(3)

    2. Build a matrix using your defined distance function reflecting the distance (assignment cost) between each user and each row of 2020 movie ratings

    3. Find the best assignment (with lowest cost) of 2020 movie ratings to users by using hungarian algorithm