I have a data set as shown in Input
table below. I want to combine rows (4,5,6)
, rows (8,9)
and rows (11,12)
of Input
table such that they share the same ID
as shown in row 4,8 and 11
in the Output
table below.
I tried merge()
, but that didn't work as expected. The key here is the ID
column which has unique values.
Any suggestions on how I can achieve this efficiently?
Input
Row Name Val1 Val2 Unit ID
1 -0.5 5.5 V UI-001
2 a -0.5 2.5 V UI-002
3 b -0.5 5.5 V UI-003
4 c -0.5 5.5 V UI-004
5 d
6 e
7 -45 125 Ohms UI-005
8 f 2 kV UI-006
9 g
10 h 500 V UI-007
11 i 15 kV UI-008
12 j
13 k UI-009
dput() of Input
structure(list(Name = c(NA, "a", "b", "c", "d", "e", NA, "f",
"g", "h", "i", "j", "k"), Val1 = c(-0.5, -0.5, -0.5, -0.5, NA,
NA, -45, 2, NA, 500, 15, NA, NA), Val2 = c(5.5, 2.5, 5.5, 5.5,
NA, NA, 125, NA, NA, NA, NA, NA, NA), Unit = c("V", "V", "V",
"V", NA, NA, "Ohms", "kV", NA, "V", "kV", NA, NA), ID = c("UI-001",
"UI-002", "UI-003", "UI-004", NA, NA, "UI-005", "UI-006", NA,
"UI-007", "UI-008", NA, "UI-009")), row.names = c(NA, -13L), class =
c("tbl_df", "tbl", "data.frame"))
Output
Row Name Val1 Val2 Unit ID
1 -0.5 5.5 V UI-001
2 a -0.5 2.5 V UI-002
3 b -0.5 5.5 V UI-003
4 cde -0.5 5.5 V UI-004
5 -45 125 Ohms UI-005
6 fg 2 kV UI-006
7 h 500 V UI-007
8 ij 15 kV UI-008
9 k UI-009
dput() of Output
structure(list(Name = c(NA, "a", "b", "cde", NA, "fg", "h", "ij",
"k"), Val1 = c(-0.5, -0.5, -0.5, -0.5, -45, 2, 500, 15, NA),
Val2 = c(5.5, 2.5, 5.5, 5.5, 125, NA, NA, NA, NA), Unit = c("V",
"V", "V", "V", "Ohms", "kV", "V", "kV", NA), ID = c("UI-001",
"UI-002", "UI-003", "UI-004", "UI-005", "UI-006", "UI-007",
"UI-008", "UI-009")), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
We may use
out <- df[!is.na(df$ID), ]
out$Name[!is.na(out$Name)] <- tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")[!is.na(out$Name)]
out
# Name Val1 Val2 Unit ID
# 1 <NA> -0.5 5.5 V UI-001
# 2 a -0.5 2.5 V UI-002
# 3 b -0.5 5.5 V UI-003
# 4 cde -0.5 5.5 V UI-004
# 7 <NA> -45.0 125.0 Ohms UI-005
# 8 fg 2.0 NA kV UI-006
# 10 h 500.0 NA V UI-007
# 11 ij 15.0 NA kV UI-008
# 13 k NA NA <NA> UI-009
The first line gets rid of all the rows where ID
is NA
. Then
tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")
# 1 2 3 4 5 6 7 8 9
# "NA" "a" "b" "cde" "NA" "fg" "h" "ij" "k"
constructs the correct values for Name
and !is.na(out$Name)
gives us which rows of out
should be modified (which is needed since "NA"
isn't the same as NA
).