rdataframerowsdelimitersplitstackshape

R separate comma separated cells into rows and Cartesian product


I have mydf data frame below. I want to split any cell that contains comma separated data and put it into rows. I am looking for a data frame similar to y below. How could i do it efficiently in few steps? Currently i am using cSplit function on one column at a time.

I tried cSplit(mydf, c("name","new"), ",", direction = "long"), but that didn`t work

library(splitstackshape)
mydf=data.frame(name = c("AB,BW","x,y,z"), AB = c('A','B'), new=c("1,2,3","4,5,6,7"))
mydf

x=cSplit(mydf, c("name"), ",", direction = "long")
x
y=cSplit(x, c("new"), ",", direction = "long")
y

Solution

  • There are times when a for loop is totally fine to work with in R. This is one of those times. Try:

    library(splitstackshape)
    cols <- c("name", "new")
    for (i in cols) {
      mydf <- cSplit(mydf, i, ",", "long")
    }
    
    mydf
    ##     name AB new
    ##  1:   AB  A   1
    ##  2:   AB  A   2
    ##  3:   AB  A   3
    ##  4:   BW  A   1
    ##  5:   BW  A   2
    ##  6:   BW  A   3
    ##  7:    x  B   4
    ##  8:    x  B   5
    ##  9:    x  B   6
    ## 10:    x  B   7
    ## 11:    y  B   4
    ## 12:    y  B   5
    ## 13:    y  B   6
    ## 14:    y  B   7
    ## 15:    z  B   4
    ## 16:    z  B   5
    ## 17:    z  B   6
    ## 18:    z  B   7
    

    Here's a small test using slightly bigger data:

    # concat.test = sample data from "splitstackshape"
    test <- do.call(rbind, replicate(5000, concat.test, FALSE))
    
    fun1 <- function() {
      cols <- c("Likes", "Siblings")
      for (i in cols) {
        test <- cSplit(test, i, ",", "long")
      }
      test
    }
    
    fun2 <- function() {
      test %>%
        separate_rows("Likes") %>%
        separate_rows("Siblings")
    }
    
    system.time(fun1())
    #   user  system elapsed 
    #  3.205   0.056   3.261 
    system.time(fun2())
    #   user  system elapsed 
    # 11.598   0.066  11.662