pythongraphlabsframe

Find particular rows in Graphlab or Python


In Graphlab,

I am working with a small subset of movies from a larger list.

  movieIds_5K_np = LL_features_SCD_min.to_numpy()[:,0]
  ratings_33K_np = ratings_33K.to_numpy()

movieIds_5K_np is an array containing my movieIds. `ratings_33K_np' is an array with FOUR columns whose second columns contains movie Ids for ALL movies.

I need to select only the rows in ratings_33K_np whose id exist in `movieIds_5K_np'.

I tried this approach but it doesn't seems to be working:

 ratings_5K_np = ratings_33K_np[ratings_33K_np[:,2]==movieIds_5K_np] 

How can I do this in Graphlab or by using some Python libraries? I should say that originally ratings_33K and movieIds_5K were imported as SFrame.

Thanks


Solution

  • Given that you have 2 sframes, you can do a join, like so:

    ratings_5K = LL_features_SCD_min[['id_column_name']].join(ratings_33K, on='id_column_name', how='left')
    

    As far as I understood from your code, the LL_features_SCD_min is the sframe corresponding to your miniset (5K data). So you just take the IDs that you want and left join them with the entire dataset, thus obtaining a new sframe with only the IDs that you wanted. Just substitute your id column name and there you go.

    For more information regarding how join work within graphlab, consider checking the documentation on SFrame.

    Good luck!