rlistfunctionrecursionnested-lists

Using a Recursive Function to Create a List Nested n Times


I have the following data frame.

Data_Frame <- data.frame(Factor_1 = rep(LETTERS[1:4], each = 12, length.out = 48), Factor_2 = rep(letters[1:3], each = 4, length.out = 48), Factor_3 = rep(1:2, each = 2, length.out = 48), Response = rnorm(48, 25, 1))

I want to create a nested list where I've split the data frame by each of the factors in the study in succession. I'll start with a vector containing the column names which contain the factors I want to split the data frame by (this vector will contain the factors in the order I want the resulting list to be nested in).

Factors_to_Split_by <- c("Factor_1", "Factor_2", "Factor_3")

The resulting list should look like the following Output object.

Output <- lapply(lapply(split(Data_Frame, Data_Frame[, which(colnames(Data_Frame) == Factors_to_Split_by[1])]), function (x) {
  split(x, x[, which(colnames(x) == Factors_to_Split_by[2])])
}), function (x) {
  lapply(x, function (y) {
    split(y, y[, which(colnames(y) == Factors_to_Split_by[3])])
  })
})

How can I write a recursive function using Factors_to_Split_by as the input and returning the desired Output list as the output? I may have more than 3 factors to split the data by, and I'd like something modular and efficient and programmatic.

Thanks!


Solution

  • Here is one possible approach using Reduce and a custom function:

    split_df <- function(x, split) {
      if (is.data.frame(x)) {
        split(x, x[split])
      } else {
        lapply(x, split_df, split = split)  
      }
    }
    
    Output2 <- Reduce(split_df, Factors_to_Split_by, init = Data_Frame)
    
    identical(Output, Output2)
    #> [1] TRUE