How to create random gaps in a time series with different lengths?

For my Masterthesis i have to check different gap-filling methods on an existing dataset. Therefore i have to add artificial gaps of different lengths (1h, 5h..) so i can gap fill them with different methods. Is there an easy function to do so?

here is an example of the dataframe:

   structure(list(DateTime = structure(c(1420074000, 1420077600, 
1420081200, 1420084800, 1420088400, 1420092000, 1420095600, 1420099200, 
1420102800, 1420106400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `Dd 1-1` = c(0.0186269166666667, 0.0242605625, 0.00373020138888889, 
    0.000966965277777778, 0.0119253611111111, 0.0495888958333333, 
    0.02014125, 0.0306862638888889, 0.0324395694444444, 0.0191942152777778
    ), `Dd 1-3` = c(0.0242500833333333, 0.0349086388888889, 0, 
    0.00135595138888889, 0.0221090138888889, 0.0600941527777778, 
    0.0462282986111111, 0.0171887638888889, 0.0481975347222222, 
    0.0226582152777778), `Dd 1-5` = c(0.0212732152777778, 0.0284445347222222, 
    0.00276098611111111, 0.0142581875, 0.0276248958333333, 0.0328644027777778, 
    0.0495009166666667, 0.0173377777777778, 0.0384788194444444, 
    0.017663875), luecken = c(0.0186269166666667, 0.0242605625, 
    0.00373020138888889, 0.000966965277777778, 0.0119253611111111, 
    0.0495888958333333, 0.02014125, 0.0306862638888889, 0.0324395694444444, 
    0.0191942152777778)), row.names = c(NA, 10L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

If I understood your problem correctly, one possible solution is this:

set.seed(4) # make it reproducable

del <- sort(sample(1:nrow(df), 4, replace=FALSE)) # get 4 random indexex from the total number of rows and sort them

del2 <-  del[diff(del) !=1] # delete those values that have a difference of 1 (meaning "connected")

df[del2, c(2:5)] <- NA # set column 2 to 5 NA for the indices we calculated above

   DateTime             `Dd 1-1` `Dd 1-3` `Dd 1-5`   luecken
   <dttm>                  <dbl>    <dbl>    <dbl>     <dbl>
 1 2015-01-01 01:00:00  0.0186    0.0243    0.0213  0.0186  
 2 2015-01-01 02:00:00  0.0243    0.0349    0.0284  0.0243  
 3 2015-01-01 03:00:00 NA        NA        NA      NA       
 4 2015-01-01 04:00:00  0.000967  0.00136   0.0143  0.000967
 5 2015-01-01 05:00:00  0.0119    0.0221    0.0276  0.0119  
 6 2015-01-01 06:00:00  0.0496    0.0601    0.0329  0.0496  
 7 2015-01-01 07:00:00  0.0201    0.0462    0.0495  0.0201  
 8 2015-01-01 08:00:00  0.0307    0.0172    0.0173  0.0307  
 9 2015-01-01 09:00:00 NA        NA        NA      NA       
10 2015-01-01 10:00:00  0.0192    0.0227    0.0177  0.0192

Just to be clear: the step of cleaning the connected gaps it not totally correct as in case of the random numbers been 1 - 4 this would drop 2, 3 and 4 but on large data it should be a sufficient solution if you are not planing to drop many values compared to the whole dataset

now on how to create larger gaps (I will use 3h as your example data has only 10 lines)

set.seed(4)

del <- sort(sample(1:nrow(df), 3, replace=FALSE))

del2 <- del[diff(del) > 3] #set difference to more than maximum size of gap wanted

del3 <- c(del2, del2 + 1, del2 + 2) # build vector with +1 and +2 to get indices conecting conecting to the onces you have

del4 <- del3[del3 <= nrow(df)] # make sure it is not out of bound (max index should be 10 even if gap starts at line 10

df[del4, c(2:5)] <- NA

    DateTime            `Dd 1-1` `Dd 1-3` `Dd 1-5` luecken
   <dttm>                 <dbl>    <dbl>    <dbl>   <dbl>
 1 2015-01-01 01:00:00   0.0186   0.0243   0.0213  0.0186
 2 2015-01-01 02:00:00   0.0243   0.0349   0.0284  0.0243
 3 2015-01-01 03:00:00  NA       NA       NA      NA     
 4 2015-01-01 04:00:00  NA       NA       NA      NA     
 5 2015-01-01 05:00:00  NA       NA       NA      NA     
 6 2015-01-01 06:00:00   0.0496   0.0601   0.0329  0.0496
 7 2015-01-01 07:00:00   0.0201   0.0462   0.0495  0.0201
 8 2015-01-01 08:00:00   0.0307   0.0172   0.0173  0.0307
 9 2015-01-01 09:00:00  NA       NA       NA      NA     
10 2015-01-01 10:00:00  NA       NA       NA      NA