rinterpolationmissing-dataimputationimputets

Time series missing value imputation: How to use maxgap inside na_kalman?


As I was just searching for a method to avoid missing value imputation for leading zeroes in time series imputation. As the leading zeroes are usually the longest series of missing values in a time series, if you are forecasting panel data with global models, I want to control these effects by the use of the maxgap argument.

The maxgap argument sets the maximal value of consecutive NA's to be still replaced during the imputation process.

However if I want to avoid the replacement of any NA series longer than 1 and set maxgap equals 1, the replacement occurs for the higher values and not the other way around as I would have expected. How do I achieve what I need here?

here some example for illustration:

library(imputeTS)
tsAirgap

repl_tsAir <- tsAirgap %>%  na_kalman(, model = "StructTS",
                                    smooth = TRUE,
                                    maxgap = 1)
repl_tsAir

Solution

  • However I want to avoid the replacement of any NA series longer than 1 and set maxgap equals 1

    For me running your code with maxgap = 1 does exactly this.

    In the input series you can see multiple NA gaps. Mostly only 1 single NA and only one series of 3 consecutive NAs.

    enter image description here

    After applying na_kalman with maxgap = 1 all single NAs get imputed as expected. The longer gap with 3 consecutive NAs is not changed.

    enter image description here