I have the following sample dataframe:
df <- data.frame(
record_id = c(1, 1, 1, 2, 2, 3, 3, 3),
instance = c(NA, NA, 2, NA, 1, 2, NA, NA),
A = c(10, NA, NA, 20, 25, NA, 30, NA),
B = c(NA, 5, NA, NA, 5, 15, NA, 15),
C = c(NA, NA, 3, NA, 5, 20, NA, 20),
D = c(NA, NA, NA, 25, 25, 30, NA, 30)
)
record_id instance A B C D
1 1 NA 10 NA NA NA
2 1 NA NA 5 NA NA
3 1 2 NA NA 3 NA
4 2 NA 20 NA NA 25
5 2 1 25 5 5 25
6 3 2 NA 15 20 30
7 3 NA 30 NA NA NA
8 3 NA NA 15 20 30
If the instance is NA, I want the rows of the same record_id to be collapsed down into one row. In my dataframe, there will not be two or more values in the same column for the same record_id and NA instance group.
In other words, I would like to get:
record_id instance A B C D
1 1 NA 10 5 NA NA
2 1 2 NA NA 3 NA
3 2 NA 20 NA NA 25
4 2 1 25 5 5 25
5 3 2 NA 15 20 30
6 3 NA 30 15 20 30
How can I do this?
library(dplyr)
df |>
reframe(
.by = c(record_id, instance),
across(everything(), ~ if (is.na(instance[1])) na.omit(.x)[1] else .x)
)
# record_id instance A B C D
# 1 1 NA 10 5 NA NA
# 2 1 2 NA NA 3 NA
# 3 2 NA 20 NA NA 25
# 4 2 1 25 5 5 25
# 5 3 2 NA 15 20 30
# 6 3 NA 30 15 20 30
ThomasIsCoding already provided an excellent base R version of this. A data.table variant is:
library(data.table)
setDT(df)
df[, lapply(.SD, \(x) if (is.na(instance[1])) na.omit(x)[1] else x), .(record_id, instance)]
# record_id instance A B C D
# <num> <num> <num> <num> <num> <num>
# 1: 1 NA 10 5 NA NA
# 2: 1 2 NA NA 3 NA
# 3: 2 NA 20 NA NA 25
# 4: 2 1 25 5 5 25
# 5: 3 2 NA 15 20 30
# 6: 3 NA 30 15 20 30