rdplyrtransformationsplitstackshape

How to create new columns in a data.frame based on row values in R?


Hej,

I have a data.frame with family trios, and I would like to add a column with the full sibs of every "id" (= offspring).

My data:

df
         id    dam    sire
1:    83295  67606   79199
2:    83297  67606   79199
3:    89826  67606   79199

What I would like to retrieve:

df2
         id    dam    sire     fs1     fs2
1:    83295  67606   79199   83297   89826  
2:    83297  67606   79199   83295   89826  
3:    89826  67606   79199   83295   83297  

What I’ve tried:

(similar to: How to transform a dataframes row into columns in R?)

library(dplyr)
library(splitstackshape)

df2 <- df %>%
  group_by(dam,sire) %>%
  summarise(id = toString(id)) %>%
  cSplit("id") %>%
  setNames(paste0("fs_", 1:ncol(.)))

colnames(df2) <- c("dam", "sire", "id", "fs1", "fs2")

Which only gives me one row per parent duo (instead of creating the same row per every "id"):

df2
     dam    sire       id      fs1     fs2
1: 67606   79199    83295    83297    89826  

In some cases there will be no full sibs, and in some cases there will be 15.

Thanks in advance for your advice! :)


Solution

  • We can group_by dam and sire get all id's except current id using setdiff and then use cSplit to separate comma-separated values into different columns.

    library(splitstackshape)
    library(dplyr)
    
    df %>%
      group_by(dam, sire) %>%
      mutate(fs = purrr::map_chr(id, ~toString(setdiff(id, .x)))) %>%
      cSplit("fs")
    
    #      id   dam  sire  fs_1  fs_2
    #1: 83295 67606 79199 83297 89826
    #2: 83297 67606 79199 83295 89826
    #3: 89826 67606 79199 83295 83297
    

    data

    df <- structure(list(id = c(83295L, 83297L, 89826L), dam = c(67606L, 
    67606L, 67606L), sire = c(79199L, 79199L, 79199L)), class = "data.frame",
    row.names = c("1:", "2:", "3:"))