I have a dataset in R with multiple height observations within different IDs. For some IDs, there are several different height measures and for some, there is only one. For most observations/rows within each ID, the height value is missing (coded as NA). I want to create a new variable which takes the first height measure available per ID and repeats it for all rows/observations of that ID (different IDs have different numbers of rows total). I have tried working with fill, mutate and with commands but I am struggling to make it work.
Currently my data looks like this:
data = data.frame(id = c(1,1,1,2,2,3,3,3,3),
height = c(150, NA, NA, NA, 148, NA, 152, 151, NA))
# id height
# 1 1 150
# 2 1 NA
# 3 1 NA
# 4 2 NA
# 5 2 148
# 6 3 NA
# 7 3 152
# 8 3 151
# 9 3 NA
Ideally, I would like to be able to add a variable (height_filled) so it will look like this:
data = data.frame(id = c(1,1,1,2,2,3,3,3,3),
height = c(150, NA, NA, NA, 148, NA, 152, 151, NA),
height_filled = c(150, 150, 150, 148, 148, 152, 152, 152, 152))
# id height height_filled
# 1 1 150 150
# 2 1 NA 150
# 3 1 NA 150
# 4 2 NA 148
# 5 2 148 148
# 6 3 NA 152
# 7 3 152 152
# 8 3 151 152
# 9 3 NA 152
Any help would be very much appreciated!
A data.table
option using first
+ na.omit
setDT(data)[, height_filled := first(na.omit(height)), id]
gives
id height height_filled
1: 1 150 150
2: 1 NA 150
3: 1 NA 150
4: 2 NA 148
5: 2 148 148
6: 3 NA 152
7: 3 152 152
8: 3 151 152
9: 3 NA 152
A base R option using ave
transform(
data,
height_filled = ave(height, id, FUN = function(x) head(na.omit(x), 1))
)
gives
id height height_filled
1 1 150 150
2 1 NA 150
3 1 NA 150
4 2 NA 148
5 2 148 148
6 3 NA 152
7 3 152 152
8 3 151 152
9 3 NA 152