rdataframer-faq

Split data.frame based on levels of a factor into new data.frames


I'm trying to create separate data.frame objects based on levels of a factor. So if I have:

df <- data.frame(
  x=rnorm(25),
  y=rnorm(25),
  g=rep(factor(LETTERS[1:5]), 5)
)

How can I split df into separate data.frames for each level of g containing the corresponding x and y values? I can get most of the way there using split(df, df$g), but I'd like the each level of the factor to have its own data.frame.

What's the best way to do this?


Solution

  • I think that split does exactly what you want.

    Notice that X is a list of data frames, as seen by str:

    X <- split(df, df$g)
    str(X)
    

    If you want individual object with the group g names you could assign the elements of X from split to objects of those names, though this seems like extra work when you can just index the data frames from the list split creates.

    #I used lapply just to drop the third column g which is no longer needed.
    Y <- lapply(seq_along(X), function(x) as.data.frame(X[[x]])[, 1:2]) 
    
    #Assign the dataframes in the list Y to individual objects
    A <- Y[[1]]
    B <- Y[[2]]
    C <- Y[[3]]
    D <- Y[[4]]
    E <- Y[[5]]
    
    #Or use lapply with assign to assign each piece to an object all at once
    lapply(seq_along(Y), function(x) {
        assign(c("A", "B", "C", "D", "E")[x], Y[[x]], envir=.GlobalEnv)
        }
    )
    

    Edit Or even better than using lapply to assign to the global environment use list2env:

    names(Y) <- c("A", "B", "C", "D", "E")
    list2env(Y, envir = .GlobalEnv)
    A