rdplyrfiltertime-seriesidentify

Keep only the hours for which the value have not changed within the hour R


I have a timeseries dataset with 'n' number of columns. In the dataset, I would like to filter and remove the hours for which the value in a column changed within the hour. In other words, I want to keep the hours that has unchanged value.

Some info about the data:

Expected output:

In the above example, I want to exclude hour 8 from my dataset, as the value in ColA is not constant.

I have a feeling that group_by() and filter() from dplyr might do the job, but I am not sure about the function to find the unchanged values within an hour.

Any help regarding this is much appreciated. Thanks.


Solution

  • This does it:

    data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA) < 3)
    

    Checking results:

    count(data1, Hour_hr)
    
      Hour_hr     n
        <dbl> <int>
    1       7    46
    2       9     1
    

    This will keep colA if there's only one numerical value or no numerical values (NA), keeping hour 7 and 9.

    Equivalently you could do:

    data1 %>% group_by(Hour_hr)  %>% filter(n_distinct(ColA, na.rm = T) < 2)