rlazy-evaluation

Lazy Evaluation of a List Element in R


Is there a way to lazily load elements of a list?

I have a list of large data.frames that each take a long time to generate and load. Typically I would not use all of the data.frames during a session, so would like them to generate and load lazily as i used them. I know I can use delayedAssign to create variables that load lazily, but this cannot be applied to list elements.

Below is a reproducible example of what does not work:

Some functions that take a while to generate data.frames:

slow_fun_1 <- function(){
  cat('running slow function 1 now \n')
  Sys.sleep(1)
  df<-data.frame(x=1:5, y=6:10)
  return(df)
}

slow_fun_2 <- function(){
  cat('running slow function 2 now \n')
  Sys.sleep(1)
  df<-data.frame(x=11:15, y=16:20)
  return(df)
}

APPROACH 1

my_list <- list()
my_list$df_1 <-slow_fun_1()
my_list$df_2 <-slow_fun_2()
# This is too slow. I might not want to use both df_1 & df_2.

APPROACH 2

my_list_2 <- list()
delayedAssign('my_list_2$df_1', slow_fun_1())
delayedAssign('my_list_2$df_2', slow_fun_2())
# Does not work. Can't assign to a list. 
my_list_2 #output is still an empty list.

Solution

  • Here is one possible solution. It is not lazy evaluation. But it calculates the data.frame when you need (and then it caches it, so the calculation is carried out only for the first time). You can use package memoise to achieve this. For example

    slow_fun_1 <- function(){
      cat('running slow function 1 now \n')
      Sys.sleep(1)
      df<-data.frame(x=1:5, y=6:10)
      return(df)
    }
    
    slow_fun_2 <- function(){
      cat('running slow function 2 now \n')
      Sys.sleep(1)
      df<-data.frame(x=11:15, y=16:20)
      return(df)
    }
    
    library(memoise)
    
    my_list <- list()
    my_list$df_1 <-memoise(slow_fun_1)
    my_list$df_2 <-memoise(slow_fun_2)
    

    and note that my_list$df_1 and so on are actually the functions that give you data.frames, so your usage should look like this:

    > my_list$df_1()
    running slow function 1 now 
      x  y
    1 1  6
    2 2  7
    3 3  8
    4 4  9
    5 5 10
    > my_list$df_1()
      x  y
    1 1  6
    2 2  7
    3 3  8
    4 4  9
    5 5 10
    > 
    

    Note that the cached function only do the actual calculation at the first time.

    Update: If you want to stick with the original usage without the function call, one way is to have a modified data structure based on the list, for example:

    library(memoise)
    
    lazy_list <- function(...){
      structure(list(...), class = c("lazy_list", "list"))
    }
    
    as.list.lazy_list <- function(x){
      structure(x, class = "list")
    }
    
    generator <- function(f){
      structure(memoise(f), class = c("generator", "function"))
    }
    
    `$.lazy_list` <- function(lst, name){
      r <- as.list(lst)[[name]]
      if (inherits(r, "generator")) {
        return(r())
      }
      return(r)
    }
    
    `[[.lazy_list` <- function(lst, name){
      r <- as.list(lst)[[name]]
      if (inherits(r, "generator")) {
        return(r())
      }
      return(r)
    }
    
    lazy1 <- lazy_list(df_1 = generator(slow_fun_1),
                       df_2 = generator(slow_fun_2),
                       df_3 = data.frame(x=11:15, y=16:20))
    
    lazy1$df_1
    lazy1$df_1
    lazy1$df_2
    lazy1$df_2
    lazy1$df_3