For my Masterthesis i have to check different gap-filling methods on an existing dataset. Therefore i have to add artificial gaps of different lengths (1h, 5h..) so i can gap fill them with different methods. Is there an easy function to do so?
here is an example of the dataframe:
structure(list(DateTime = structure(c(1420074000, 1420077600,
1420081200, 1420084800, 1420088400, 1420092000, 1420095600, 1420099200,
1420102800, 1420106400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
`Dd 1-1` = c(0.0186269166666667, 0.0242605625, 0.00373020138888889,
0.000966965277777778, 0.0119253611111111, 0.0495888958333333,
0.02014125, 0.0306862638888889, 0.0324395694444444, 0.0191942152777778
), `Dd 1-3` = c(0.0242500833333333, 0.0349086388888889, 0,
0.00135595138888889, 0.0221090138888889, 0.0600941527777778,
0.0462282986111111, 0.0171887638888889, 0.0481975347222222,
0.0226582152777778), `Dd 1-5` = c(0.0212732152777778, 0.0284445347222222,
0.00276098611111111, 0.0142581875, 0.0276248958333333, 0.0328644027777778,
0.0495009166666667, 0.0173377777777778, 0.0384788194444444,
0.017663875), luecken = c(0.0186269166666667, 0.0242605625,
0.00373020138888889, 0.000966965277777778, 0.0119253611111111,
0.0495888958333333, 0.02014125, 0.0306862638888889, 0.0324395694444444,
0.0191942152777778)), row.names = c(NA, 10L), class = c("tbl_df",
"tbl", "data.frame"))
If I understood your problem correctly, one possible solution is this:
set.seed(4) # make it reproducable
del <- sort(sample(1:nrow(df), 4, replace=FALSE)) # get 4 random indexex from the total number of rows and sort them
del2 <- del[diff(del) !=1] # delete those values that have a difference of 1 (meaning "connected")
df[del2, c(2:5)] <- NA # set column 2 to 5 NA for the indices we calculated above
DateTime `Dd 1-1` `Dd 1-3` `Dd 1-5` luecken
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-01-01 01:00:00 0.0186 0.0243 0.0213 0.0186
2 2015-01-01 02:00:00 0.0243 0.0349 0.0284 0.0243
3 2015-01-01 03:00:00 NA NA NA NA
4 2015-01-01 04:00:00 0.000967 0.00136 0.0143 0.000967
5 2015-01-01 05:00:00 0.0119 0.0221 0.0276 0.0119
6 2015-01-01 06:00:00 0.0496 0.0601 0.0329 0.0496
7 2015-01-01 07:00:00 0.0201 0.0462 0.0495 0.0201
8 2015-01-01 08:00:00 0.0307 0.0172 0.0173 0.0307
9 2015-01-01 09:00:00 NA NA NA NA
10 2015-01-01 10:00:00 0.0192 0.0227 0.0177 0.0192
Just to be clear: the step of cleaning the connected gaps it not totally correct as in case of the random numbers been 1 - 4 this would drop 2, 3 and 4 but on large data it should be a sufficient solution if you are not planing to drop many values compared to the whole dataset
now on how to create larger gaps (I will use 3h as your example data has only 10 lines)
set.seed(4)
del <- sort(sample(1:nrow(df), 3, replace=FALSE))
del2 <- del[diff(del) > 3] #set difference to more than maximum size of gap wanted
del3 <- c(del2, del2 + 1, del2 + 2) # build vector with +1 and +2 to get indices conecting conecting to the onces you have
del4 <- del3[del3 <= nrow(df)] # make sure it is not out of bound (max index should be 10 even if gap starts at line 10
df[del4, c(2:5)] <- NA
DateTime `Dd 1-1` `Dd 1-3` `Dd 1-5` luecken
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-01-01 01:00:00 0.0186 0.0243 0.0213 0.0186
2 2015-01-01 02:00:00 0.0243 0.0349 0.0284 0.0243
3 2015-01-01 03:00:00 NA NA NA NA
4 2015-01-01 04:00:00 NA NA NA NA
5 2015-01-01 05:00:00 NA NA NA NA
6 2015-01-01 06:00:00 0.0496 0.0601 0.0329 0.0496
7 2015-01-01 07:00:00 0.0201 0.0462 0.0495 0.0201
8 2015-01-01 08:00:00 0.0307 0.0172 0.0173 0.0307
9 2015-01-01 09:00:00 NA NA NA NA
10 2015-01-01 10:00:00 NA NA NA NA