rtime-seriessubset

Create a subset dataframe when values in one dataframe match a condition of a defined value in another dataframe in R


I'm trying to create a subset dataframe (df3) when value of 1 in col1 (df1) matches value of 1 in signal (df2) at the same timestamp (datetime). The new dataframe should include datetime, col1 and signal. I've seen many other questions on stackoverflow but none of them match what I'm trying to do.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# Generate datetime sequence in 15-minute intervals for both dataframes
start_time_df1 <- ymd_hms("2024-01-01 00:00:00")
datetime_df1 <- seq(from = start_time_df1, by = "15 mins", length.out = 20)

start_time_df2 <- ymd_hms("2024-01-01 00:00:00")
datetime_df2 <- seq(from = start_time_df2, by = "15 mins", length.out = 35)

# Generate random signal values (either 1 or 0) for both dataframes
set.seed(123)  # for reproducibility
signal_df1 <- sample(c(0, 1), size = 20, replace = TRUE)

col1_df1<- sample(c(0, 1), size = 20, replace = TRUE)

set.seed(456)  # different seed for df2 to ensure different random values
signal_df2 <- sample(c(0, 1), size = 35, replace = TRUE)

# Create the dataframes
df1 <- data.frame(datetime = datetime_df1, Col1= col1_df1)
df2 <- data.frame(datetime = datetime_df2, signal = signal_df2)

str(df1)
#> 'data.frame':    20 obs. of  2 variables:
#>  $ datetime: POSIXct, format: "2024-01-01 00:00:00" "2024-01-01 00:15:00" ...
#>  $ Col1    : num  0 1 0 0 0 0 1 1 0 1 ...
str(df2)
#> 'data.frame':    35 obs. of  2 variables:
#>  $ datetime: POSIXct, format: "2024-01-01 00:00:00" "2024-01-01 00:15:00" ...
#>  $ signal  : num  0 0 0 1 0 1 0 0 1 0 ...

# Print the dataframes
print(df1)
#>               datetime Col1
#> 1  2024-01-01 00:00:00    0
#> 2  2024-01-01 00:15:00    1
#> 3  2024-01-01 00:30:00    0
#> 4  2024-01-01 00:45:00    0
#> 5  2024-01-01 01:00:00    0
#> 6  2024-01-01 01:15:00    0
#> 7  2024-01-01 01:30:00    1
#> 8  2024-01-01 01:45:00    1
#> 9  2024-01-01 02:00:00    0
#> 10 2024-01-01 02:15:00    1
#> 11 2024-01-01 02:30:00    0
#> 12 2024-01-01 02:45:00    1
#> 13 2024-01-01 03:00:00    0
#> 14 2024-01-01 03:15:00    1
#> 15 2024-01-01 03:30:00    1
#> 16 2024-01-01 03:45:00    0
#> 17 2024-01-01 04:00:00    0
#> 18 2024-01-01 04:15:00    0
#> 19 2024-01-01 04:30:00    0
#> 20 2024-01-01 04:45:00    1
print(df2)
#>               datetime signal
#> 1  2024-01-01 00:00:00      0
#> 2  2024-01-01 00:15:00      0
#> 3  2024-01-01 00:30:00      0
#> 4  2024-01-01 00:45:00      1
#> 5  2024-01-01 01:00:00      0
#> 6  2024-01-01 01:15:00      1
#> 7  2024-01-01 01:30:00      0
#> 8  2024-01-01 01:45:00      0
#> 9  2024-01-01 02:00:00      1
#> 10 2024-01-01 02:15:00      0
#> 11 2024-01-01 02:30:00      1
#> 12 2024-01-01 02:45:00      0
#> 13 2024-01-01 03:00:00      0
#> 14 2024-01-01 03:15:00      1
#> 15 2024-01-01 03:30:00      0
#> 16 2024-01-01 03:45:00      0
#> 17 2024-01-01 04:00:00      0
#> 18 2024-01-01 04:15:00      1
#> 19 2024-01-01 04:30:00      1
#> 20 2024-01-01 04:45:00      1
#> 21 2024-01-01 05:00:00      0
#> 22 2024-01-01 05:15:00      0
#> 23 2024-01-01 05:30:00      1
#> 24 2024-01-01 05:45:00      1
#> 25 2024-01-01 06:00:00      0
#> 26 2024-01-01 06:15:00      0
#> 27 2024-01-01 06:30:00      0
#> 28 2024-01-01 06:45:00      1
#> 29 2024-01-01 07:00:00      1
#> 30 2024-01-01 07:15:00      0
#> 31 2024-01-01 07:30:00      0
#> 32 2024-01-01 07:45:00      1
#> 33 2024-01-01 08:00:00      1
#> 34 2024-01-01 08:15:00      1
#> 35 2024-01-01 08:30:00      0

Any help greatly appreciated.


Solution

  • Using a joining operation with a filter will get you matching rows with the desired condition

    library(dplyr)
    
    left_join(df1, df2) %>% 
      filter(Col1 == 1 & signal == 1)
    Joining with `by = join_by(datetime)`
                 datetime Col1 signal
    1 2024-01-01 03:15:00    1      1
    2 2024-01-01 04:45:00    1      1