rmissing-dataimputationimputets

Impute missing variables but not at the beginning and the end?


Consider the following working example:

library(data.table)
library(imputeTS)

DT <- data.table(
  time = c(1:10),
  var1 = c(1:5, NA, NA, 8:10),
  var2 = c(NA, NA, 1:4, NA, 6, 7, 8),
  var3 = c(1:6, rep(NA, 4))
)

        time var1 var2 var3
 1:    1    1   NA    1
 2:    2    2   NA    2
 3:    3    3    1    3
 4:    4    4    2    4
 5:    5    5    3    5
 6:    6   NA    4    6
 7:    7   NA   NA   NA
 8:    8    8    6   NA
 9:    9    9    7   NA
10:   10   10    8   NA

I want to impute the missing values at different points within the time series using the na_interpolation from the imputeTS package. However, I do not want to impute missing values at the beginning or the end of the series which can be of various length (In my application replacing those values would not make sense).

When I run the following code to impute the series, however all the NAs get replaced:

DT[,(cols_to_impute_example) := lapply(.SD, na_interpolation), .SDcols = cols_to_impute_example]
> DT
    time var1 var2 var3
 1:    1    1    1    1
 2:    2    2    1    2
 3:    3    3    1    3
 4:    4    4    2    4
 5:    5    5    3    5
 6:    6    6    4    6
 7:    7    7    5    6
 8:    8    8    6    6
 9:    9    9    7    6
10:   10   10    8    6

What I want to achieve is:

    time var1 var2 var3
 1:    1    1   NA    1
 2:    2    2   NA    2
 3:    3    3    1    3
 4:    4    4    2    4
 5:    5    5    3    5
 6:    6    6    4    6
 7:    7    7    5   NA
 8:    8    8    6   NA
 9:    9    9    7   NA
10:   10   10    8   NA

Solution

  • Library zoo offers a function for interpolation that allows more customization:

    library(zoo)
    DT[,(2:4) := lapply(.SD, na.approx, x = time, na.rm = FALSE), .SDcols = 2:4]