time-seriesimputets

Calculate average gap size in time series by extracting data from imputeTS functions


I need to calculate the average gap size of a univariate time-series data set. imputeTS package generates plots using this data. Is it possible to extract the 'gap size' and the 'number of occurrence' from either statsNA or ggplot_na_gapsize? Or is there any other way to find the average size of gaps in a time-series data set? (You could use tsNH4 data set from the imputeTS package)

(This is my first time asking questions here and I'm fairly new to 'r')


Solution

  • At the moment you can get the average gap size only indirectly with some extra work with the CRAN version of imputeTS.

    But I made a quick update to the development version on GitHub. Now you can also get the average gap size with the statsNA function.

    Therefore you have to install the new version from GitHub first (since it is not on CRAN yet):

    library("devtools")
    install_github("SteffenMoritz/imputeTS")
    

    If you do not have "devtools" installed, then also install this library at the very beginning

    install.packages("devtools")
    

    Afterwards just use the imputeTS package as usual.

    library("imputeTS")
    
    #Example with the tsNH4 dataset
    statsNA(tsNH4)
    

    This will now print you the following:

    > statsNA(tsNH4)
    
    [1] "Length of time series:"
    [1] 4552
    [1] "-------------------------"
    [1] "Number of Missing Values:"
    [1] 883
    [1] "-------------------------"
    [1] "Percentage of Missing Values:"
    [1] "19.4%"
    [1] "-------------------------"
    [1] "Number of Gaps:"
    [1] 155
    [1] "-------------------------"
    [1] "Average Gap Size:"
    [1] 5.696774
    [1] "-------------------------"
    [1] "Stats for Bins"
    [1] "  Bin 1 (1138 values from 1 to 1138) :      233 NAs (20.5%)"
    [1] "  Bin 2 (1138 values from 1139 to 2276) :      433 NAs (38%)"
    [1] "  Bin 3 (1138 values from 2277 to 3414) :      135 NAs (11.9%)"
    [1] "  Bin 4 (1138 values from 3415 to 4552) :      82 NAs (7.21%)"
    [1] "-------------------------"
    [1] "Longest NA gap (series of consecutive NAs)"
    [1] "157 in a row"
    [1] "-------------------------"
    [1] "Most frequent gap size (series of consecutive NA series)"
    [1] "1 NA in a row (occuring 68 times)"
    [1] "-------------------------"
    [1] "Gap size accounting for most NAs"
    [1] "157 NA in a row (occuring 1 times, making up for overall 157 NAs)"
    

    As you can see, 'Number of gaps' and 'Average gap size' is now newly added to the output.

    You can also access the output as a variable:

    library("imputeTS")
    
    #To actually get a output object, set print_only to false
    
    out <- statsNA(tsNH4, print_only = F)
    
    # Average gap size
    out$average_size_na_gaps
    
    # Number of Gaps
    out$number_na_gaps
    
    #Number of NAs
    out$number_NAs
    

    The updates will also be in the next CRAN update. (thanks for the suggestion) Just be a little bit careful, since it is a development version - thus not so thoroughly tested as the CRAN version.