pysparkapache-spark-sqlapache-spark-mllibals

Spark ALS model.transform(test) drops rows from test. What could be the reason?


test (a table with columns: user_id, item_id, rating, with 6.2M rows)

als = ALS(userCol="user_id",
                itemCol="item_id",
                ratingCol="rating",
                coldStartStrategy="drop",
                implicitPrefs=True)
model = als.fit(train)
predictions = model.transform(test)

predictions (a table with columns: user_id, item_id, rating, prediction, but with only 1.7M rows)

Why did model.transform(test) drop rest of the rows? It should have been able to calculate prediction score for all user_id, item_id combination, right?

Is it because I have used coldStartStrategy="drop"?


Solution

  • It's because I have used the coldStartStrategy="drop" option only. It's dropping rows corresponding to users and items which had no interactions corresponding to them in training data.