I want to interpolate zero values in a time series dataframe but only if: 1) there is only one missing value so subsequent and proceeding values are non-zero, 2) the surrounding non-zero values are higher than 2.
See example below. Thanks a lot!
df = data.frame(row.names = c("year 1", "year 2", "year 3", "year 4", "year 5"),
person1 = c(33,0,8,6,3), person2 = c(1,3,0,0,5), person3 = c(0,3,0,5,1))
before:
person1 person2 person3
year 1 33 1 0
year 2 0 3 3
year 3 8 0 0
year 4 6 0 2
year 5 3 5 1
desired result:
person1 person2 person3
year 1 33 1 0
year 2 *20.5* 3 3
year 3 8 0 0
year 4 6 0 2
year 5 3 5 1
I have tried different variations of lead & lag mutations and the na.approx function but to no avail.
Here's a one-liner using the tidyverse:
library(tidyverse)
df <- data.frame(row.names = c("year 1", "year 2", "year 3", "year 4", "year 5"), person1 = c(33,0,8,6,3), person2 = c(1,3,0,0,5), person3 = c(0,3,0,5,1))
df |>
mutate(
across(everything(), \(x) ifelse(x == 0 & lag(x, default = 0) > 2 & lead(x, default = 0) > 2, rowMeans(cbind(lag(x), lead(x))), x))
)
person1 person2 person3
year 1 33.0 1 0
year 2 20.5 3 3
year 3 8.0 0 4
year 4 6.0 0 5
year 5 3.0 5 1