rdurationtimelinebinning

Binning duration data in R with a start and end time for data that spans multiple bins


Very new to R and have been googling this question to death to no avail - any insight would be greatly appreciated!

I have data organized in the following way:

Individual Behavior start end duration
M1 A 0 1.505 1.505
M1 B 1.505 106.256 104.751
M1 E 3.266 67.255 63.989
M1 E 71.268 77.506 6.238

Duration, start time, and end time are all in seconds.

I want to sum the duration of each behavior in 1 minute bins - basically say this individual spent this amount of time doing behavior A in the first minute, this amount of time doing behavior B in the first minute, and so on for each behavior and each minute. Eventually, I would like to average time spent doing each behavior in each bin across experimental groups, and make a line graph for each experimental group, with duration on the y axis and 1 min bins on the x axis, with different series for each behavior.

<Edited to add: For the example above, the desired outcome would be:

M1 spent 1.505 seconds doing behavior A in bin 1 (0 seconds - 60 seconds)

M1 spent 0 seconds doing behavior A in bin 2 (>60 seconds - 120 seconds)

M1 spent 58.495 seconds doing behavior B in bin 1 (0 seconds - 60 seconds)

M1 spent 46.256 seconds doing behavior B in bin 2 (>60 seconds - 120 seconds)

M1 spent 56.734 seconds doing behavior E in bin 1 (0 seconds - 60 seconds)

M1 spent 13.493 seconds doing behavior E in bin 2 (>60 seconds - 120 seconds)

I'm not picky about how this data would be displayed, when I had binned previously it would just add a column labeled "bin" and assign the bin number for each row, but I don't think that will be possible now because behaviors can span multiple bins.>

The issue though, is that the way I have binned before (in excel and SPSS) is by defining what bin the start time falls into - but in this case for a lot of behaviors the duration is over 1 minute, so the start time and end time belong in different bins. Is there a way to parse this out in r?

Thanks!


Solution

  • One way in tidyverse below.

    First, I reframe the data giving a row for each relevant bin within each Indiv/Behavior combo. Then I find how much of the time was within that bin. Finally I sum up the totals by Indiv/Behav/Bin in case there are multiples.

    library(tidyverse)
    df |>
      reframe(bin = seq(floor(start/60), floor(end/60), 1),
              .by = c(Individual, Behavior, start, end, duration)) |>
      mutate(secs_active = pmin(end, (bin+1)*60) - pmax(start, bin*60)) |>
      summarize(secs_active = sum(secs_active), .by = c(Individual, Behavior, bin))
    

    Result

      Individual Behavior bin secs_active
    1         M1        A   0       1.505
    2         M1        B   0      58.495
    3         M1        B   1      46.256
    4         M1        E   0      56.734
    5         M1        E   1      13.493