rtimebinningperiodicity

Extract data values at a higher frequency than time stamps


I have continuous behavior data with the timestamp when the subject changed behaviors and what each behavior was, and I need to extract the instantaneous behavior at each minute, starting at the second that the first behavior began: if the behavior started at 17:34:06, I'd define the next minute as 17:35:06. I also have the durations of each behavior calculated. This is what my data looks like:

df <- data.frame(Behavior = c("GRAZ", "MLTC", "GRAZ", "MLTC", "VIGL"),
                 Behavior_Start = c("2022-05-10 17:34:06","2022-05-10 17:38:04","2022-05-10 17:38:26","2022-05-10 17:41:49","2022-05-10 17:42:27"),
                 Behavior_Duration_Minutes = c(0.000000,3.961683,4.325933,7.722067,8.350017))

print(df)

I've used cut() to bin each row into the minute it falls into, but I can't figure out how to get the behavior values for the minutes in which a new behavior doesn't occur (i.e. minutes 2:4 here), and this bases it off the minute but doesn't account for the second that the first behavior began.

time <- data.frame(as.POSIXct(df$Behavior_Start, tz = "America/Denver"))
colnames(time) <- "time"
df <- cbind(df,time)
df.cut <- data.frame(df, cuts = cut(df$time, breaks= "1 min", labels = FALSE))

print(df.cut)

So the dataframe I'd like to end up with would look like this:

new.df <- data.frame(Minute = c(1:10),
                     Timestamp = c("2022-05-10 17:34:06","2022-05-10 17:35:06","2022-05-10 17:36:06","2022-05-10 17:37:06","2022-05-10 17:38:06","2022-05-10 17:39:06","2022-05-10 17:40:06","2022-05-10 17:41:06","2022-05-10 17:42:06","2022-05-10 17:43:06"),   
                     Behavior = c("GRAZ","GRAZ","GRAZ","MLTC","GRAZ","GRAZ","GRAZ","MLTC","VIGL","VIGL"))

print(new.df)

Solution

  • Your data:

    library(dplyr)
    library(tidyr)
    library(purrr)
    
    your_df <- data.frame(
      Behavior = c("Grazing","Vigilant","Grazing","Other","Grazing"),
      Behavior_Start = c("2022-05-10 17:34:06","2022-05-10 17:38:04","2022-05-10 17:38:26","2022-05-10 17:41:49","2022-05-10 17:42:27"),
      Behavior_Duration_Minutes = c(0.000000,3.961683,4.325933,7.722067,8.350017)
    )
    

    Using lead() on the duration column gives you the start and end of each "period" of the activity, and then you need to fill in with a minute for each of that duration.

    # Make a list column that generates a sequence of minutes "included" in
    #   the `Behavior_Duration_Minutes` column. You'll need to play with this
    #   logic in terms of whether or not you want `floor()` or `round()` etc.
    #   Also update the endpoint, here hardcoded at 10 minutes.
    high_res_df <- 
      your_df %>% 
      mutate(
        minutes_covered = purrr::map2(
          ceiling(Behavior_Duration_Minutes), 
          lead(Behavior_Duration_Minutes, default = 10),
          ~seq(.x, .y)
          )
        )
    high_res_df
    #>   Behavior      Behavior_Start Behavior_Duration_Minutes minutes_covered
    #> 1  Grazing 2022-05-10 17:34:06                  0.000000      0, 1, 2, 3
    #> 2 Vigilant 2022-05-10 17:38:04                  3.961683               4
    #> 3  Grazing 2022-05-10 17:38:26                  4.325933         5, 6, 7
    #> 4    Other 2022-05-10 17:41:49                  7.722067               8
    #> 5  Grazing 2022-05-10 17:42:27                  8.350017           9, 10
    

    Now that you've generated the list of minutes included, you can use unnest() to get closer to your desired output.

    # And here expand out that list-column into a regular sequence
    high_res_long <- 
      tidyr::unnest(
        high_res_df,
        "minutes_covered"
      )
    high_res_long
    #> # A tibble: 11 × 4
    #>    Behavior Behavior_Start      Behavior_Duration_Minutes minutes_covered
    #>    <chr>    <chr>                                   <dbl>           <int>
    #>  1 Grazing  2022-05-10 17:34:06                      0                  0
    #>  2 Grazing  2022-05-10 17:34:06                      0                  1
    #>  3 Grazing  2022-05-10 17:34:06                      0                  2
    #>  4 Grazing  2022-05-10 17:34:06                      0                  3
    #>  5 Vigilant 2022-05-10 17:38:04                      3.96               4
    #>  6 Grazing  2022-05-10 17:38:26                      4.33               5
    #>  7 Grazing  2022-05-10 17:38:26                      4.33               6
    #>  8 Grazing  2022-05-10 17:38:26                      4.33               7
    #>  9 Other    2022-05-10 17:41:49                      7.72               8
    #> 10 Grazing  2022-05-10 17:42:27                      8.35               9
    #> 11 Grazing  2022-05-10 17:42:27                      8.35              10
    

    Created on 2023-01-13 with reprex v2.0.2

    You'll need to play around with this a bit to match exactly what you want.