rrandomtimegaps-in-data

How to create random gaps in a time series with different lengths?


For my Masterthesis i have to check different gap-filling methods on an existing dataset. Therefore i have to add artificial gaps of different lengths (1h, 5h..) so i can gap fill them with different methods. Is there an easy function to do so?

here is an example of the dataframe:

   structure(list(DateTime = structure(c(1420074000, 1420077600, 
1420081200, 1420084800, 1420088400, 1420092000, 1420095600, 1420099200, 
1420102800, 1420106400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `Dd 1-1` = c(0.0186269166666667, 0.0242605625, 0.00373020138888889, 
    0.000966965277777778, 0.0119253611111111, 0.0495888958333333, 
    0.02014125, 0.0306862638888889, 0.0324395694444444, 0.0191942152777778
    ), `Dd 1-3` = c(0.0242500833333333, 0.0349086388888889, 0, 
    0.00135595138888889, 0.0221090138888889, 0.0600941527777778, 
    0.0462282986111111, 0.0171887638888889, 0.0481975347222222, 
    0.0226582152777778), `Dd 1-5` = c(0.0212732152777778, 0.0284445347222222, 
    0.00276098611111111, 0.0142581875, 0.0276248958333333, 0.0328644027777778, 
    0.0495009166666667, 0.0173377777777778, 0.0384788194444444, 
    0.017663875), luecken = c(0.0186269166666667, 0.0242605625, 
    0.00373020138888889, 0.000966965277777778, 0.0119253611111111, 
    0.0495888958333333, 0.02014125, 0.0306862638888889, 0.0324395694444444, 
    0.0191942152777778)), row.names = c(NA, 10L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • If I understood your problem correctly, one possible solution is this:

    set.seed(4) # make it reproducable
    
    del <- sort(sample(1:nrow(df), 4, replace=FALSE)) # get 4 random indexex from the total number of rows and sort them
    
    del2 <-  del[diff(del) !=1] # delete those values that have a difference of 1 (meaning "connected")
    
    df[del2, c(2:5)] <- NA # set column 2 to 5 NA for the indices we calculated above
    
       DateTime             `Dd 1-1` `Dd 1-3` `Dd 1-5`   luecken
       <dttm>                  <dbl>    <dbl>    <dbl>     <dbl>
     1 2015-01-01 01:00:00  0.0186    0.0243    0.0213  0.0186  
     2 2015-01-01 02:00:00  0.0243    0.0349    0.0284  0.0243  
     3 2015-01-01 03:00:00 NA        NA        NA      NA       
     4 2015-01-01 04:00:00  0.000967  0.00136   0.0143  0.000967
     5 2015-01-01 05:00:00  0.0119    0.0221    0.0276  0.0119  
     6 2015-01-01 06:00:00  0.0496    0.0601    0.0329  0.0496  
     7 2015-01-01 07:00:00  0.0201    0.0462    0.0495  0.0201  
     8 2015-01-01 08:00:00  0.0307    0.0172    0.0173  0.0307  
     9 2015-01-01 09:00:00 NA        NA        NA      NA       
    10 2015-01-01 10:00:00  0.0192    0.0227    0.0177  0.0192 
    

    Just to be clear: the step of cleaning the connected gaps it not totally correct as in case of the random numbers been 1 - 4 this would drop 2, 3 and 4 but on large data it should be a sufficient solution if you are not planing to drop many values compared to the whole dataset

    now on how to create larger gaps (I will use 3h as your example data has only 10 lines)

    set.seed(4)
    
    del <- sort(sample(1:nrow(df), 3, replace=FALSE))
    
    del2 <- del[diff(del) > 3] #set difference to more than maximum size of gap wanted
    
    del3 <- c(del2, del2 + 1, del2 + 2) # build vector with +1 and +2 to get indices conecting conecting to the onces you have
    
    del4 <- del3[del3 <= nrow(df)] # make sure it is not out of bound (max index should be 10 even if gap starts at line 10
    
    df[del4, c(2:5)] <- NA
    
        DateTime            `Dd 1-1` `Dd 1-3` `Dd 1-5` luecken
       <dttm>                 <dbl>    <dbl>    <dbl>   <dbl>
     1 2015-01-01 01:00:00   0.0186   0.0243   0.0213  0.0186
     2 2015-01-01 02:00:00   0.0243   0.0349   0.0284  0.0243
     3 2015-01-01 03:00:00  NA       NA       NA      NA     
     4 2015-01-01 04:00:00  NA       NA       NA      NA     
     5 2015-01-01 05:00:00  NA       NA       NA      NA     
     6 2015-01-01 06:00:00   0.0496   0.0601   0.0329  0.0496
     7 2015-01-01 07:00:00   0.0201   0.0462   0.0495  0.0201
     8 2015-01-01 08:00:00   0.0307   0.0172   0.0173  0.0307
     9 2015-01-01 09:00:00  NA       NA       NA      NA     
    10 2015-01-01 10:00:00  NA       NA       NA      NA