rdplyrsplit-apply-combine

tidyr or dplyr equivalent of JMP split table


JMP has a "split table" platform:
http://www.jmp.com/support/help/Split_Columns.shtml

Here is the image for it:

enter image description here

The "split by" becomes part of the column headers.
The "split columns" are the columns spread out.
The "group" are retained columns.

I have looked at a few links/pages and can't seem to get this right in R. Right now I have to kluge it into a macro in JMP.

Links that didn't help me include:

I need to split a table of ~20k rows and ~30 columns, along one of the columns (integers between 0 and 13), to being ~1400 rows with ~25 split into 350.

An inelegant, but repeatable, example is splitting this cars table
enter image description here

according to this:
enter image description here

Yields this:
enter image description here

How do I do this and retain the ~5 non-split columns using an R library like tidyr or dplyr?


Solution

  • Using reshape, it's not too terrible to do one split column at a time. You could then merge the model and engine.disp together. For your real example, you could just change the lists in aggregate and formula in cast.

    x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
    names(x) <- tolower(names(x))
    
    agg <- aggregate(list(model = x$model),list(origin = x$origin,cylinders = x$cylinders,year = x$year),FUN = paste,collapse = ',')
    
    require(reshape)
    output <- cast(data = agg,formula = origin + cylinders ~ year,value = 'model')
    

    Edit: I haven't checked all possible cases, but this function should work similar to the split tables, or at least give you a good start.

    x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
    names(x) <- tolower(names(x))
    
    jmpsplitcol <- function(data,splitby,splitcols,group){
    
      require(reshape)
      require(tidyr)
    
      aggsplitlist <- data[ ,names(data) %in% c(splitby,group)]
      aggsplitlist <- lapply(aggsplitlist,`[`)
    
      agg <- aggregate(list(data[ ,names(data) %in% splitcols]),aggsplitlist,FUN = paste,collapse = ',')
    
      newgat <- gather_(data = agg,key = 'splitcolname','myval',splitcols)
    
      castformula <- as.formula(paste(paste(group,collapse = ' + '),'~','splitcolname','+',splitby))
      output <- cast(data = newgat,formula = castformula,value = 'myval')
      output
    }
    res <- jmpsplitcol(x,c('year'),c('engine.disp','model'),c('origin','cylinders'))
    head(res2)