rsparklyrmatrix-factorization

R extract latent factors from ALS implementation in Sparklyr


using the ALS example from the sparklyr documentation:

library(sparklyr)
sc <- spark_connect(master = "local")

movies <- data.frame(
  user   = c(1, 2, 0, 1, 2, 0),
  item   = c(1, 1, 1, 2, 2, 0),
  rating = c(3, 1, 2, 4, 5, 4)
)
movies_tbl <- sdf_copy_to(sc, movies)

model <- ml_als(movies_tbl, rating ~ user + item)

How can you then extract the resulting latent user and item factors from the model?


Solution

  • Got there in the end with tidy(model).

    Here's an updated example with 3 users and 4 items:

    library(sparklyr)
    sc <- spark_connect(master = "local")
    
    # 3 users, 4 films:
    movies <- data.frame(
      user   = c(1, 1, 1, 1, 2, 2, 3, 3, 3, 3),
      item   = c(1, 2, 3, 4, 1, 2, 1, 2, 3, 4),
      rating = c(3, 1, 2, 5, 1, 5, 1, 1, 5, 4)
    )
    movies_tbl <- sdf_copy_to(sc, movies, overwrite = TRUE)
    movies_tbl <- sdf_copy_to(sc, movies)
    
    model <- ml_als(movies_tbl, rating ~ user + item)
    

    You can extract the users and items latent factors with:

    model_tidy <- tidy(model) %>% collect
    
    # A tibble: 4 x 3
         id user_factors item_factors
      <int> <list>       <list>      
    1     1 <list [10]>  <list [10]> 
    2     3 <list [10]>  <list [10]> 
    3     2 <list [10]>  <list [10]> 
    4     4 <lgl [1]>    <list [10]> 
    
    

    so the list element is <lgl[1}> for ids that don't exist in either the user or item list.