I am building recommender system - hybrid in Lightfm. My data has 39326 unique users and 2569 unique game titles(items). My train interaction sparce matrix has shape: <39326x2569 sparse matrix of type '<class 'numpy.float64'>' with 758931 stored elements in Compressed Sparse Row format> My test interaction sparce matrix has shape:<39323x2569 sparse matrix of type '<class 'numpy.float64'>' with 194622 stored elements in Compressed Sparse Row format>
I train model: model1 = LightFM(learning_rate=0.01, loss='warp')
model1.fit(train_interactions,
epochs=20)
which creates object: <lightfm.lightfm.LightFM at 0x1bf8c8dc4c8>
But when I try to check accuracy by:
train_precision = precision_at_k(model1, train_interactions, k=10).mean()
test_precision = precision_at_k(model1, test_interactions, k=10).mean()
I get error message: Incorrect number of features in user_features WHY??? Clearly the shapes are compatible? What am I missing?
Your test sparse matrix is of dimension 39323x2569 while your train sparse matrix is of dimension 39326x2569. You are missing 3 users in your test set.
I suggest you use the lightfm built-in train/test split function to avoid errors : https://making.lyst.com/lightfm/docs/cross_validation.html
If you want to split your data in your own way, you can also transform your user_id and item_id to consecutive integers starting from 0. And then use this :
from lightfm.data import Dataset
# Create your train and test set in the format [[user_id1, item_id1, score1], ..., [user_idn, item_idn, scoren]]
# Your score can be just 1 for an implicit interaction
# user_id and item_id are integers
data = Dataset()
data.fit(unique_user_ids, # list from 0 to n_users
unique_item_ids # list from 0 to n_items
)
train, weights_matrix = data.build_interactions([tuple(i) for i in train])
test, weights_matrix = data.build_interactions([tuple(i) for i in test])