rtime-seriesgaps-in-data

rolling computation to fill gaps by finding following or previous values in a data.table time series


I have a data.table that looks like this:

tsdata <- data.table(time   = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                     signal = c(0, 1, 1, 0, 0, 1, 0, 0, 0, 1))

I am trying to fill the gaps between the ones, but only if the gap of zeros is small. So a flexible solution to define the gap would be nice. In this example the gap with zeros shouldn't be bigger than 2.

The result should look like this:

tsdata <- data.table(time   = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                     signal = c(0, 1, 1, 1, 1, 1, 0, 0, 0, 1))

My real time series data is much bigger than this, so any help is appreciated.


Solution

  • Group by rleid(signal) and then fill in short 0 sequences not at the beginning or end with 1.

     tsdata[, signal2 := ifelse(signal[1] == 0 & 
                               .N <= 2 & 
                               time[1] > min(tsdata$time) & 
                               time[.N] < max(tsdata$time), 1, signal),
      by = rleid(signal)]
    
    tsdata
    

    giving:

        time signal signal2
     1:    1      0       0
     2:    2      1       1
     3:    3      1       1
     4:    4      0       1
     5:    5      0       1
     6:    6      1       1
     7:    7      0       0
     8:    8      0       0
     9:    9      0       0
    10:   10      1       1