rreshapeqcc

Convert R dataframe from long to wide format, but with unequal group sizes, for use with qcc


I would like to convert a dataframe from long format to a wide format, but with unequal group sizes.

The eventual use will be in 'qcc', which requires a data frame or a matrix with each row consisting of one group, using NA's in groups which have fewer samples.

The following code will create an example dataset, as well as show manual conversion to the desired format.

# This is an example of the initial data that I have
# * 10 sample measurements, over 3 groups with 3, 2, and 5 elements respectively
x <- rnorm(10)
x_df <- data.frame( time = c( rep('2001 Q1',3), rep('2001 Q2',2), rep('2001 Q3',5) ), measure = x )
x_df

# This is a manual conversion into the desired format
x_pad <- c( x[1:3], NA, NA, x[4:5], NA, NA, NA, x[6:10] )
x_matrix <- matrix( x_pad, nrow = 3, ncol = 5, byrow = TRUE, dimnames = list(c('2001 Q1','2001 Q2','2001 Q3')) )
x_matrix # desired format

# An example of how it will be used
library(qcc)
plot(qcc(x_matrix, type = 'xbar', plot = FALSE))

So, I'd like to convert this:

      time     measure
1  2001 Q1  0.14680685
2  2001 Q1  0.53593193
3  2001 Q1  0.56097974
4  2001 Q2 -1.48102689
5  2001 Q2  0.18150972
6  2001 Q3  1.72018147
7  2001 Q3 -0.08480855
8  2001 Q3 -2.23208877
9  2001 Q3 -1.15269107
10 2001 Q3  0.57975023

... to this ...

              [,1]        [,2]       [,3]      [,4]      [,5]
2001 Q1  0.1468068  0.53593193  0.5609797        NA        NA
2001 Q2 -1.4810269  0.18150972         NA        NA        NA
2001 Q3  1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502

There is probably an easy way (perhaps some usage of reshape or reshape2 casting that I'm not familiar with?), but a bunch of searching hasn't helped me so far.

Thanks for any help!

==========

From one of the solutions below, the following will generate the final qcc xbar plot, including group labels:

library(splitstackshape)
out_df <- dcast( getanID( x_df, 'time' ), time~.id, value.var='measure' )
qcc( out_df[,-1], type = 'xbar', labels = out_df[,1] )

Solution

  • You can create a sequence column ('.id') using getanID from splitstackshape and use dcast from data.table to convert the long format to wide format. The output of splitstackshape is a data.table. When we load splitstackshape, data.table will also be loaded. So, if you already have the devel version of data.table, then the dcast from data.table can be used as well.

    library(splitstackshape)
    dcast(getanID(df1, 'time'), time~.id, value.var='measure')
    #     time          1           2          3         4         5
    #1: 2001 Q1  0.1468068  0.53593193  0.5609797        NA        NA
    #2: 2001 Q2 -1.4810269  0.18150972         NA        NA        NA
    #3: 2001 Q3  1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502
    

    Update

    As @snoram mentioned in the comments, function rowid from data.table makes it easier to use just data.table alone

    library(data.table)
    dcast(setDT(df1), time ~ rowid(time), value.var = "measure")