test
(a table with columns: user_id
, item_id
, rating
, with 6.2M rows)
als = ALS(userCol="user_id",
itemCol="item_id",
ratingCol="rating",
coldStartStrategy="drop",
implicitPrefs=True)
model = als.fit(train)
predictions = model.transform(test)
predictions
(a table with columns: user_id
, item_id
, rating
, prediction
, but with only 1.7M rows)
Why did model.transform(test)
drop rest of the rows? It should have been able to calculate prediction score for all user_id
, item_id
combination, right?
Is it because I have used coldStartStrategy="drop"
?
user_id
, item_id
combinations in test
, no row should be dropped, yes?It's because I have used the coldStartStrategy="drop"
option only. It's dropping rows corresponding to users and items which had no interactions corresponding to them in training data.