I am trying to find how long a participant (ID) waits at a junction if their speed is less that 2 KPH. I then want to create a new variable for this time frame which would be the "Waiting time (s)". I plan to compare the waiting times of participants based on their group and experience.
Below is a simplified dataset of what my variables and data looks like.
| ID | group | experience | Junction | speed (KPH) | time (s) |
|---|---|---|---|---|---|
| 1 | 1 | 1 | ROW1-A | 5 | 57.2 |
| 1 | 1 | 1 | ROW1-A | 4 | 58.2 |
| 1 | 1 | 1 | ROW1-A | 3 | 59.2 |
| 1 | 1 | 1 | ROW1-A | 2 | 60.2 |
| 1 | 1 | 1 | ROW1-A | 1 | 61.2 |
| 1 | 1 | 1 | ROW1-A | 0 | 62.2 |
| 1 | 1 | 1 | ROW1-A | 0 | 63.2 |
| 1 | 1 | 1 | ROW1-A | 1 | 64.2 |
| 1 | 1 | 1 | ROW1-A | 2 | 65.2 |
| 1 | 1 | 1 | ROW1-A | 3 | 66.2 |
| 1 | 1 | 1 | ROW1-A | 4 | 67.2 |
| 1 | 1 | 1 | ROW1-A | 5 | 68.2 |
| 2 | 2 | 2 | ROW1-A | 3 | 26.5 |
| 2 | 2 | 2 | ROW1-A | 2 | 27.5 |
| 2 | 2 | 2 | ROW1-A | 1 | 28.5 |
| 2 | 2 | 2 | ROW1-A | 0 | 29.5 |
| 2 | 2 | 2 | ROW1-A | 0 | 30.5 |
| 2 | 2 | 2 | ROW1-A | 0 | 31.5 |
| 2 | 2 | 2 | ROW1-A | 0 | 32.5 |
| 2 | 2 | 2 | ROW1-A | 1 | 33.5 |
| 2 | 2 | 2 | ROW1-A | 2 | 34.5 |
| 2 | 2 | 2 | ROW1-A | 3 | 35.5 |
In the real dataset, there are 13 different junctions names (e.g. ROW1-A, ROW2-A, GW1-L, GW2-R).
The time variable for each participant is linear but each participant starts at different times. In this example, the time range less than 2 KPH for participant one is between 61.2 and 64.2 with a waiting time of 3 seconds whereas for participant 2 the time range is between 28.5 and 33.5 with a waiting time of 5 seconds. The new variable would need to compare the 3 seconds to the 5 seconds.
This approach uses the dplyr package and will give you a summary table that achieves what you want. I have created a 'dummy' dataset based on your description in order to illustrate it working on more representative data. I have also modified your column names; it is good practice to avoid punctuation in column names and it makes coding easier:
install.packages("dplyr") # Run this only if you don't have dplyr installed already
library(dplyr)
# 'Dummy' data
df <- data.frame(ID = rep(1:10, each = 11),
group = rep(1:2, each = 11),
experience = rep(1:4, each = 11, length.out = 110),
Junction = rep(c("ROW1-A", "ROW2-A", "ROW3-A","GW1-L", "GW2-R"), each = 22),
speed_KPH = rep(c(c(5:0,1:5),
c(4:0,0,0,1:4),
c(3:0,0,0,0,0,1:3),
c(2:0,0,0,0,0,0,0,1:2)), length.out = 110),
time_s = seq(1.5, 110.5, by = 1))
head(df)
ID group experience Junction speed_KPH time_s
1 1 1 1 ROW1-A 5 1.5
2 1 1 1 ROW1-A 4 2.5
3 1 1 1 ROW1-A 3 3.5
4 1 1 1 ROW1-A 2 4.5
5 1 1 1 ROW1-A 1 5.5
6 1 1 1 ROW1-A 0 6.5
# Summary df
df1 <- df %>%
filter(speed_KPH < 2) %>% # return required range
group_by(ID, Junction) %>% # group by person ID
mutate(waiting_time_s = max(time_s) - min(time_s)) %>% # calculate time difference
select(-c(speed_KPH, time_s)) %>% # Remove unneeded columns
distinct() %>% # return one record per ID
ungroup()
# Result
data.frame(df1)
ID group experience Junction waiting_time_s
1 1 1 1 ROW1-A 2
2 2 2 2 ROW1-A 4
3 3 1 3 ROW2-A 6
4 4 2 4 ROW2-A 8
5 5 1 1 ROW3-A 2
6 6 2 2 ROW3-A 4
7 7 1 3 GW1-L 6
8 8 2 4 GW1-L 8
9 9 1 1 GW2-R 2
10 10 2 2 GW2-R 4