rtimeconditional-statements

Is there a way to create a time value based on two conditions and existing time variable?


I am trying to find how long a participant (ID) waits at a junction if their speed is less that 2 KPH. I then want to create a new variable for this time frame which would be the "Waiting time (s)". I plan to compare the waiting times of participants based on their group and experience.

Below is a simplified dataset of what my variables and data looks like.

ID group experience Junction speed (KPH) time (s)
1 1 1 ROW1-A 5 57.2
1 1 1 ROW1-A 4 58.2
1 1 1 ROW1-A 3 59.2
1 1 1 ROW1-A 2 60.2
1 1 1 ROW1-A 1 61.2
1 1 1 ROW1-A 0 62.2
1 1 1 ROW1-A 0 63.2
1 1 1 ROW1-A 1 64.2
1 1 1 ROW1-A 2 65.2
1 1 1 ROW1-A 3 66.2
1 1 1 ROW1-A 4 67.2
1 1 1 ROW1-A 5 68.2
2 2 2 ROW1-A 3 26.5
2 2 2 ROW1-A 2 27.5
2 2 2 ROW1-A 1 28.5
2 2 2 ROW1-A 0 29.5
2 2 2 ROW1-A 0 30.5
2 2 2 ROW1-A 0 31.5
2 2 2 ROW1-A 0 32.5
2 2 2 ROW1-A 1 33.5
2 2 2 ROW1-A 2 34.5
2 2 2 ROW1-A 3 35.5

In the real dataset, there are 13 different junctions names (e.g. ROW1-A, ROW2-A, GW1-L, GW2-R).

The time variable for each participant is linear but each participant starts at different times. In this example, the time range less than 2 KPH for participant one is between 61.2 and 64.2 with a waiting time of 3 seconds whereas for participant 2 the time range is between 28.5 and 33.5 with a waiting time of 5 seconds. The new variable would need to compare the 3 seconds to the 5 seconds.


Solution

  • This approach uses the dplyr package and will give you a summary table that achieves what you want. I have created a 'dummy' dataset based on your description in order to illustrate it working on more representative data. I have also modified your column names; it is good practice to avoid punctuation in column names and it makes coding easier:

    install.packages("dplyr") # Run this only if you don't have dplyr installed already
    library(dplyr)
    
    # 'Dummy' data
    df <- data.frame(ID = rep(1:10, each = 11),
                     group = rep(1:2, each = 11),
                     experience = rep(1:4, each = 11,  length.out = 110),
                     Junction = rep(c("ROW1-A", "ROW2-A", "ROW3-A","GW1-L", "GW2-R"), each = 22),
                     speed_KPH = rep(c(c(5:0,1:5), 
                                       c(4:0,0,0,1:4), 
                                       c(3:0,0,0,0,0,1:3), 
                                       c(2:0,0,0,0,0,0,0,1:2)), length.out = 110),
                     time_s = seq(1.5, 110.5, by = 1))
    
    head(df)
      ID group experience Junction speed_KPH time_s
    1  1     1          1   ROW1-A         5    1.5
    2  1     1          1   ROW1-A         4    2.5
    3  1     1          1   ROW1-A         3    3.5
    4  1     1          1   ROW1-A         2    4.5
    5  1     1          1   ROW1-A         1    5.5
    6  1     1          1   ROW1-A         0    6.5
    
    # Summary df
    df1 <- df %>%
      filter(speed_KPH < 2) %>% # return required range
      group_by(ID, Junction) %>% # group by person ID
      mutate(waiting_time_s = max(time_s) - min(time_s)) %>% # calculate time difference
      select(-c(speed_KPH, time_s)) %>% # Remove unneeded columns
      distinct() %>% # return one record per ID
      ungroup()
    
    # Result
    data.frame(df1)
       ID group experience Junction waiting_time_s
    1   1     1          1   ROW1-A              2
    2   2     2          2   ROW1-A              4
    3   3     1          3   ROW2-A              6
    4   4     2          4   ROW2-A              8
    5   5     1          1   ROW3-A              2
    6   6     2          2   ROW3-A              4
    7   7     1          3    GW1-L              6
    8   8     2          4    GW1-L              8
    9   9     1          1    GW2-R              2
    10 10     2          2    GW2-R              4