rconcatenation

Concatenate Rows If Value of Another Column In Next Row Is Empty


I have a data set as shown in Input table below. I want to combine rows (4,5,6), rows (8,9) and rows (11,12) of Input table such that they share the same ID as shown in row 4,8 and 11 in the Output table below.

I tried merge(), but that didn't work as expected. The key here is the ID column which has unique values.

Any suggestions on how I can achieve this efficiently?

Input

Row Name Val1 Val2 Unit ID
1        -0.5 5.5   V   UI-001
2    a   -0.5 2.5   V   UI-002
3    b   -0.5 5.5   V   UI-003
4    c   -0.5 5.5   V   UI-004
5    d              
6    e              
7        -45 125  Ohms  UI-005
8    f     2        kV  UI-006
9    g              
10   h   500        V   UI-007
11   i    15        kV  UI-008
12   j              
13   k                  UI-009

dput() of Input

structure(list(Name = c(NA, "a", "b", "c", "d", "e", NA, "f", 
"g", "h", "i", "j", "k"), Val1 = c(-0.5, -0.5, -0.5, -0.5, NA, 
NA, -45, 2, NA, 500, 15, NA, NA), Val2 = c(5.5, 2.5, 5.5, 5.5, 
NA, NA, 125, NA, NA, NA, NA, NA, NA), Unit = c("V", "V", "V", 
"V", NA, NA, "Ohms", "kV", NA, "V", "kV", NA, NA), ID = c("UI-001", 
"UI-002", "UI-003", "UI-004", NA, NA, "UI-005", "UI-006", NA, 
"UI-007", "UI-008", NA, "UI-009")), row.names = c(NA, -13L), class = 
c("tbl_df", "tbl", "data.frame"))

Output

Row Name Val1 Val2 Unit ID
1        -0.5 5.5   V   UI-001
2    a   -0.5 2.5   V   UI-002
3    b   -0.5 5.5   V   UI-003
4    cde -0.5 5.5   V   UI-004      
5        -45  125 Ohms  UI-005
6    fg    2        kV  UI-006  
7    h   500        V   UI-007
8    ij   15        kV  UI-008
9    k                  UI-009

dput() of Output

structure(list(Name = c(NA, "a", "b", "cde", NA, "fg", "h", "ij", 
"k"), Val1 = c(-0.5, -0.5, -0.5, -0.5, -45, 2, 500, 15, NA), 
Val2 = c(5.5, 2.5, 5.5, 5.5, 125, NA, NA, NA, NA), Unit = c("V", 
"V", "V", "V", "Ohms", "kV", "V", "kV", NA), ID = c("UI-001", 
"UI-002", "UI-003", "UI-004", "UI-005", "UI-006", "UI-007", 
"UI-008", "UI-009")), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • We may use

    out <- df[!is.na(df$ID), ]
    out$Name[!is.na(out$Name)] <- tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")[!is.na(out$Name)]
    out
    #    Name  Val1  Val2 Unit     ID
    # 1  <NA>  -0.5   5.5    V UI-001
    # 2     a  -0.5   2.5    V UI-002
    # 3     b  -0.5   5.5    V UI-003
    # 4   cde  -0.5   5.5    V UI-004
    # 7  <NA> -45.0 125.0 Ohms UI-005
    # 8    fg   2.0    NA   kV UI-006
    # 10    h 500.0    NA    V UI-007
    # 11   ij  15.0    NA   kV UI-008
    # 13    k    NA    NA <NA> UI-009
    

    The first line gets rid of all the rows where ID is NA. Then

    tapply(df$Name, cumsum(!is.na(df$ID)), paste, collapse = "")
    #     1     2     3     4     5     6     7     8     9 
    #  "NA"   "a"   "b" "cde"  "NA"  "fg"   "h"  "ij"   "k" 
    

    constructs the correct values for Name and !is.na(out$Name) gives us which rows of out should be modified (which is needed since "NA" isn't the same as NA).