Very new to R and have been googling this question to death to no avail - any insight would be greatly appreciated!
I have data organized in the following way:
Individual | Behavior | start | end | duration |
---|---|---|---|---|
M1 | A | 0 | 1.505 | 1.505 |
M1 | B | 1.505 | 106.256 | 104.751 |
M1 | E | 3.266 | 67.255 | 63.989 |
M1 | E | 71.268 | 77.506 | 6.238 |
Duration, start time, and end time are all in seconds.
I want to sum the duration of each behavior in 1 minute bins - basically say this individual spent this amount of time doing behavior A in the first minute, this amount of time doing behavior B in the first minute, and so on for each behavior and each minute. Eventually, I would like to average time spent doing each behavior in each bin across experimental groups, and make a line graph for each experimental group, with duration on the y axis and 1 min bins on the x axis, with different series for each behavior.
<Edited to add: For the example above, the desired outcome would be:
M1 spent 1.505 seconds doing behavior A in bin 1 (0 seconds - 60 seconds)
M1 spent 0 seconds doing behavior A in bin 2 (>60 seconds - 120 seconds)
M1 spent 58.495 seconds doing behavior B in bin 1 (0 seconds - 60 seconds)
M1 spent 46.256 seconds doing behavior B in bin 2 (>60 seconds - 120 seconds)
M1 spent 56.734 seconds doing behavior E in bin 1 (0 seconds - 60 seconds)
M1 spent 13.493 seconds doing behavior E in bin 2 (>60 seconds - 120 seconds)
I'm not picky about how this data would be displayed, when I had binned previously it would just add a column labeled "bin" and assign the bin number for each row, but I don't think that will be possible now because behaviors can span multiple bins.>
The issue though, is that the way I have binned before (in excel and SPSS) is by defining what bin the start time falls into - but in this case for a lot of behaviors the duration is over 1 minute, so the start time and end time belong in different bins. Is there a way to parse this out in r?
Thanks!
One way in tidyverse below.
First, I reframe
the data giving a row for each relevant bin within each Indiv/Behavior combo. Then I find how much of the time was within that bin. Finally I sum up the totals by Indiv/Behav/Bin in case there are multiples.
library(tidyverse)
df |>
reframe(bin = seq(floor(start/60), floor(end/60), 1),
.by = c(Individual, Behavior, start, end, duration)) |>
mutate(secs_active = pmin(end, (bin+1)*60) - pmax(start, bin*60)) |>
summarize(secs_active = sum(secs_active), .by = c(Individual, Behavior, bin))
Result
Individual Behavior bin secs_active
1 M1 A 0 1.505
2 M1 B 0 58.495
3 M1 B 1 46.256
4 M1 E 0 56.734
5 M1 E 1 13.493