rexponentialimputets

fill in blanks with exponential estimates


I'm trying to fill in NA values with numbers that show exponential growth. Below is a data sample of what I'm trying to do.


library(tidyverse)

expand.grid(X2009H1N1 = "0-17 years",
            type = "Cases",
            month = seq(as.Date("2009-04-12") , to = as.Date("2010-03-12"), by = "month")) %>% 
  bind_cols( data.frame(
    MidLevelRange = c(0,NA,NA,NA,NA,NA,8000000,16000000,18000000,19000000,19000000,19000000),
    lowEst = c(0,NA,NA,NA,NA,NA,5000000,12000000,12000000,13000000,14000000,14000000)
  ))

I have used %>% arrange(month, X2009H1N1) %>% group_by(X2009H1N1, type ) %>% mutate(aprox_MidLevelRange = zoo::na.approx(MidLevelRange, na.rm = FALSE)) but the result does not look exponential to me. Thanks


Solution

  • Have a look at the imputeTS package. It offers plenty of imputation functions for time series. Take a look at this paper to get a good overview of all offered options

    In your case using Stineman interpolation ( imputeTS::na_interpolation(x, option ="stine") could maybe be a suitable option.

    Here for the example you provided:

    x <- expand.grid(
      X2009H1N1 = "0-17 years",
      type = "Cases",
      month = seq(as.Date("2009-04-12"),
        to = as.Date("2010-03-12"),
        by = "month"
      )
    ) %>%
      bind_cols(data.frame(
        MidLevelRange = c(0, NA, NA, NA, NA, NA, 8000000, 16000000, 18000000, 19000000, 19000000, 19000000),
        lowEst = c(0, NA, NA, NA, NA, NA, 5000000, 12000000, 12000000, 13000000, 14000000, 14000000)
      ))
    
    x %>%
      arrange(month, X2009H1N1) %>%
      group_by(X2009H1N1, type) %>%
      mutate(aprox_MidLevelRange = imputeTS::na_interpolation(MidLevelRange, option = "stine"))
    

    This gives you:

    # A tibble: 12 x 6
    # Groups:   X2009H1N1, type [1]
       X2009H1N1  type  month      MidLevelRange   lowEst aprox_MidLevelRange
       <fct>      <fct> <date>             <dbl>    <dbl>               <dbl>
     1 0-17 years Cases 2009-04-12             0        0                  0 
     2 0-17 years Cases 2009-05-12            NA       NA             593718.
     3 0-17 years Cases 2009-06-12            NA       NA            1335612.
     4 0-17 years Cases 2009-07-12            NA       NA            2289061.
     5 0-17 years Cases 2009-08-12            NA       NA            3559604.
     6 0-17 years Cases 2009-09-12            NA       NA            5336975.
     7 0-17 years Cases 2009-10-12       8000000  5000000            8000000 
     8 0-17 years Cases 2009-11-12      16000000 12000000           16000000 
     9 0-17 years Cases 2009-12-12      18000000 12000000           18000000 
    10 0-17 years Cases 2010-01-12      19000000 13000000           19000000 
    11 0-17 years Cases 2010-02-12      19000000 14000000           19000000 
    12 0-17 years Cases 2010-03-12      19000000 14000000           19000000 
    

    So just comparing interpolation functions I guess this could be the best option.

    Just plot yourself the different interpolation options, to see the differences. In general this are the interpolation options:

    imputeTS::na_interpolation(x, option ="linear")
    imputeTS::na_interpolation(x, option ="spline")
    imputeTS::na_interpolation(x, option ="stine")
    

    linear / spline options from imputeTS are the same as zoo::approx()/ zoo::spline(). stine does not exist in zoo.