rrowdataframepasting

converting different rows of a data frame to one single row in R


I have a dataset that looks like this:

CATA 1 10101
CATA 2 11101
CATA 3 10011
CATB 1 10100
CATB 2 11100
CATB 3 10011

etc.

and I want to combine these different rows into a single, long row like this:

CATA 101011110110011
CATB 101001110010011

I've tried doing this with melt() and then dcast(), but it doesn't seem to work. Does anyone have some simple pieces of code to do this?


Solution

  • Look at the paste command and specifically the collapse argument. It's not clear what should happen if/when you have different values for the first column, so I won't venture to guess. Update your question if you get stuck.

    dat <- data.frame(V1 = "CATA", V2 = 1:3, V3 = c(10101, 11101, 10011))
    paste(dat$V3, collapse= "")
    [1] "101011110110011"
    

    Note that you may want to convert the data to character first to prevent leading zeros from being trimmed.

    EDIT: to address multiple values for the first column

    Use plyr's ddply function which expects a data.frame as an input and a grouping variable(s). We then use the same paste() trick as before along with summarize().

        library(plyr)
        dat <- data.frame(V1 = sample(c("CATA", "CATB"), 10, TRUE)
                        , V2 = 1:10
                        , V3 = sample(0:100, 10, TRUE)
                        )
    
        ddply(dat, "V1", summarize, newCol = paste(V3, collapse = ""))
    
        V1         newCol
    1 CATA          16110
    2 CATB 19308974715042