rgroup-bydplyrtransformsplitstackshape

How to list row values in a column based on grouping value in R?


Hej,

I have an input the file that has one column with gene id and then one with GO terms with multiple rows per gene (anywhere from 1 to >20). The format I need to generate has one single row for each unique gene id, with the GO terms in a second column, separated by semi-colons.

My data:

GeneID    GO
am1001    190909
am1001    600510
am1002    500050
am1002    432323
am1002    100209

The desired output:

GeneID    GO_list
am1001    190909; 600510
am1002    ​50050; 432323; 100209

I have tried things similar to How to create new columns in a data.frame based on row values in R? but was not successful.

Thanks in advance for your advice! :)


Solution

  • I would suggest next base R approach:

    #Data
    df <- structure(list(GeneID = c("am1001", "am1001", "am1002", "am1002", 
    "am1002"), GO = c(190909L, 600510L, 500050L, 432323L, 100209L
    )), class = "data.frame", row.names = c(NA, -5L))
    

    The code:

    #Aggregation
    aggregate(GO~GeneID,data=df,FUN = function(x) paste0(x,collapse = '; '))
    

    The output:

      GeneID                     GO
    1 am1001         190909; 600510
    2 am1002 500050; 432323; 100209