I'll preface this by saying I'm still pretty new to R and programming in general but I'm taking baby steps and this assignment is one of my more challenging ones.
I have two excel spreadsheets and I'm trying to combine one of them (the working sheet) into the master sheet. The matching columns only include the ID and dates. There is an extra complication which is that the master sheet may not include all the recently added IDs on the working sheet but I still want the new IDs and related information added as new rows in the master sheet.
How could I best do this in R?
Hopefully that explanation made some sense and thank you for helping!
So far, I have only tried using purrr:map_dfr following the advice of others but it seems to only work (at least when I tried it) for sheets with matching columns.
Here are example tables!
Working sheet with #78 acting as a participant that was not added/is missing from the master sheet
ID | Date | Wave | Notes | Data_1 | Data_2 | Data_3 |
---|---|---|---|---|---|---|
58 | 1/18/2014 | Y | 47.5 | 384.3 | 394.3 | |
144 | 1/18/2014 | Y2 | 65.3 | 38.3 | 90.4 | |
58 | 4/18/2014 | Y2 | 499.2 | 293.2 | 22.3 | |
78 | 1/19/2018 | Y5 | 24.4 | 22.1 | 12.1 |
Master sheet
ID | Date | Wave | Data_4 | Data_5 | Data_6 | Data_7 | Diagnosis | Age | Status | Gender |
---|---|---|---|---|---|---|---|---|---|---|
58 | 1/18/2014 | Y1 | 39.4 | 3959.3 | 474.3 | 283.2 | Negative | 45 | ... | M |
144 | 1/18/2014 | Y2 | 484.3 | 4849.2 | 383.3 | 393.3 | Positive | 80 | ... | F |
58 | 4/18/2015 | Y2 | 501.6 | 394.2 | 2394.2 | 290.3 | Positive | 46 | ... | M |
Ideally, from the working sheet, I would want to insert Data_1, Data_2, and Data_3 in specific places in the master sheet such as between wave and Data_4.
library(tidyverse)
otherSheet <- data.frame(ID = c(58, 144, 58, 78),
Date = c('1/18/2014', '1/18/2014',
'4/18/2014', '1/19/2018'),
Wave = c('Y', 'Y2', 'Y2', 'Y5'),
Notes = rep('',4 ),
Data_1 = c(47.5, 65.3, 499.2, 24.4),
Data_2 = c(384.2, 38.2, 293.2, 22.1),
Data_3 = c(394.3, 90.4, 22.3, 12.1))
masterSheet <- data.frame(ID = c(58, 144, 58),
Date = c('1/18/2014', '1/18/2014',
'4/18/2014'),
Wave = c('Y', 'Y2', 'Y2'),
Data_4 = c(39.4, 484.3, 501.6),
Data_5 = c(3959.3, 4849.2, 394.2),
Data_6 = c(474.3, 383.3, 2394.2),
Diagnosis = c('Negative', 'Postive', 'Positive'),
Age = c(45, 80, 46),
Status = rep('...', 3),
Gender = c('M', 'F', 'M'))
combinedSheets <- masterSheet %>%
full_join(otherSheet)