rmulti-level

Repeating a value within each ID when there are multiple value options in R


I have a dataset in R with multiple height observations within different IDs. For some IDs, there are several different height measures and for some, there is only one. For most observations/rows within each ID, the height value is missing (coded as NA). I want to create a new variable which takes the first height measure available per ID and repeats it for all rows/observations of that ID (different IDs have different numbers of rows total). I have tried working with fill, mutate and with commands but I am struggling to make it work.

Currently my data looks like this:

data = data.frame(id = c(1,1,1,2,2,3,3,3,3), 
                 height = c(150, NA, NA, NA, 148, NA, 152, 151, NA))

# id height
# 1  1    150
# 2  1     NA
# 3  1     NA
# 4  2     NA
# 5  2    148
# 6  3     NA
# 7  3    152
# 8  3    151
# 9  3     NA

Ideally, I would like to be able to add a variable (height_filled) so it will look like this:

data = data.frame(id = c(1,1,1,2,2,3,3,3,3),
                  height = c(150, NA, NA, NA, 148, NA, 152, 151, NA),
                  height_filled = c(150, 150, 150, 148, 148, 152, 152, 152, 152))

# id height height_filled
# 1  1    150           150
# 2  1     NA           150
# 3  1     NA           150
# 4  2     NA           148
# 5  2    148           148
# 6  3     NA           152
# 7  3    152           152
# 8  3    151           152
# 9  3     NA           152

Any help would be very much appreciated!


Solution

  • A data.table option using first + na.omit

    setDT(data)[, height_filled := first(na.omit(height)), id]
    

    gives

       id height height_filled
    1:  1    150           150
    2:  1     NA           150
    3:  1     NA           150
    4:  2     NA           148
    5:  2    148           148
    6:  3     NA           152
    7:  3    152           152
    8:  3    151           152
    9:  3     NA           152
    

    A base R option using ave

    transform(
      data,
      height_filled = ave(height, id, FUN = function(x) head(na.omit(x), 1))
    )
    

    gives

      id height height_filled
    1  1    150           150
    2  1     NA           150
    3  1     NA           150
    4  2     NA           148
    5  2    148           148
    6  3     NA           152
    7  3    152           152
    8  3    151           152
    9  3     NA           152