long-format-data

from long to wide format multiple variables in R


I have a table in long format like this:

gene  tissue tpm
  A   liver   5
  A   brain   2
  B   ovary   10
  B   brain   1
  C   brain   15
  C   liver   6

I'd like to convert it into a wider format:

gene tissue1 tissue2 tpm1 tpm2
  A  liver   brain    5    2
  B  ovary   brain    10   1
  C  brain   liver    15   6

I have tried with dcast and spread but I get this result:

gene  liver brain ovary
 A      5     2     NA
 B      NA    1     10
 C      6     15    NA

Which is NOT what I want.

Thank you!


Solution

  • I am not aware of a function that can solve this puzzle all at once in R language, but you can use a for loop to rearrange you data frame.

    The code is presented below:

    data <- data.frame(gene=c("A","A","B","B","C","C"),
                    tissue=c("liver", "brain", "ovary", "brain", "brain", "liver"),
                    tpm=c(5,2,10,1,15,6))
    
    gene.unique <- unique(data$gene)
    i <- 1
    for (dummy in gene.unique) {
      genes.idx <- which(data$gene == dummy)
      tissue1[i] <- data$tissue[genes.idx[1]]
      tissue2[i] <- data$tissue[genes.idx[2]]
      tpm1[i] <- data$tpm[genes.idx[1]]
      tpm2[i] <- data$tpm[genes.idx[2]]
      i <- i+1
    }
    
    data.final <- data.frame(gene=gene.unique, tissue1, tissue2, tpm1, tpm2)
    
      gene tissue1 tissue2 tpm1 tpm2
    1    A   liver   brain    5    2
    2    B   ovary   brain   10    1
    3    C   brain   liver   15    6
    

    I hope it helps you.