In Graphlab,
I am working with a small subset of movies from a larger list.
movieIds_5K_np = LL_features_SCD_min.to_numpy()[:,0]
ratings_33K_np = ratings_33K.to_numpy()
movieIds_5K_np
is an array containing my movieIds. `ratings_33K_np' is an array with FOUR columns whose second columns contains movie Ids for ALL movies.
I need to select only the rows in ratings_33K_np
whose id exist in `movieIds_5K_np'.
I tried this approach but it doesn't seems to be working:
ratings_5K_np = ratings_33K_np[ratings_33K_np[:,2]==movieIds_5K_np]
How can I do this in Graphlab or by using some Python libraries? I should say that originally ratings_33K
and movieIds_5K
were imported as SFrame.
Thanks
Given that you have 2 sframe
s, you can do a join
, like so:
ratings_5K = LL_features_SCD_min[['id_column_name']].join(ratings_33K, on='id_column_name', how='left')
As far as I understood from your code, the LL_features_SCD_min
is the sframe
corresponding to your miniset (5K data). So you just take the IDs that you want and left join them with the entire dataset, thus obtaining a new sframe
with only the IDs that you wanted. Just substitute your id column name and there you go.
For more information regarding how join
work within graphlab
, consider checking the documentation on SFrame
.
Good luck!