rtimetidyr

R How to reframe data to connect start and stop times


I have a data set that describes narwhal behaviours. There are eight whales (whaleID), multiple behaviours (behaviour), each whale can interact with none, some or all of the other whales (modID), and each incident has either a start or stop (status) time in seconds (time_s). The way the data was generated in the program used to record the behaviours provided an output where the start and stop times are in separate rows. Here is an example of how the data looks:

library(dplyr)
df <- data.frame(whaleID = c('c','a','b','c','c','a','b','c','c','c','b','c','c','b',
                             'b','b','b','a','b','a','b','b','b','b','c','c'),
                 status = c('start','start','start','stop','start','stop','stop','stop',
                            'start','stop','start','start','stop','stop',
                            'start','stop','start','start','stop','stop',
                            'start','stop','start','stop','start','stop'),
                 time_s = c(40.554,44.801,44.801,60.300,76.053,87.302,87.302,87.302,
                            41.056,42.801,45.304,54.800,56.871,75.054,
                            92.798,94.801,98.844,101.676,105.052,108.160,
                            113.918,118.914,132.170,136.799,248.227,252.568),
                 modID = c('d','b','a','d','d','b','a','d','d','d','a','d','d','a',
                           'a,c','a,c','a','b','a','b','','','','','',''),
                 behaviour = c('contact','contact','contact','contact','contact','contact','contact','contact',
                               'rub','rub','rub','rub','rub','rub',
                               'ventral','ventral','ventral','ventral','ventral','ventral',
                               'erect','erect','erect','erect','erect','erect')) %>% 
  arrange(time_s)

My problem is that I need to find a way to have each behavioural event be a line that includes the start and end time.

Ideal output:

df2 <- data.frame(whaleID = c('c','a','b','c','c','b','c','b','b','a','b','b','c'),
                  start = c(40.554,44.801,44.801,76.053,41.056,45.304,54.800,
                            92.798,98.844,101.676,113.918,132.170,248.227),
                  stop = c(60.300,87.302,87.302,87.302,42.801,75.054,56.871,
                           94.801,105.052,108.160,118.914,136.799,252.568),
                  modID = c('d','b','a','d','d','a','d','a,c','a','b','','',''),
                  behaviour = c('contact','contact','contact','contact','rub','rub','rub',
                                'ventral','ventral','ventral','erect','erect','erect')) %>% 
  arrange(start)

I've tried to nest case_when within reframe, summarize and pivot wider without success. For example:

dfTime <- df %>% 
  group_by(whaleID, behaviour, modID) %>% 
  reframe(start = case_when(status == "START" & min(time_s) ~ time_s),
          stop = case_when(status == "STOP" & min(time_s)~ time_s)) %>%
  ungroup()


Solution

  • Since each has multiple sessions, we can create an indicator that increments each time a new "start" event happens. Try

    df %>% 
      arrange(time_s) %>% 
      group_by(whaleID, behaviour, modID) %>% 
      mutate(run=cumsum(status=="start")) %>% 
      group_by(run, .add=TRUE) %>% 
      summarize(start = time_s[status == "start"],
              stop = time_s[status == "stop"],
              ) %>%
      ungroup() %>%
      select(-run)
    

    Which returns

    # A tibble: 13 × 6
       whaleID behaviour modID   run start  stop
       <chr>   <chr>     <chr> <int> <dbl> <dbl>
     1 c       contact   "d"       1  40.6  60.3
     2 c       rub       "d"       1  41.1  42.8
     3 a       contact   "b"       1  44.8  87.3
     4 b       contact   "a"       1  44.8  87.3
     5 b       rub       "a"       1  45.3  75.1
     6 c       rub       "d"       2  54.8  56.9
     7 c       contact   "d"       2  76.1  87.3
     8 b       ventral   "a,c"     1  92.8  94.8
     9 b       ventral   "a"       1  98.8 105. 
    10 a       ventral   "b"       1 102.  108. 
    11 b       erect     ""        1 114.  119. 
    12 b       erect     ""        2 132.  137. 
    13 c       erect     ""        1 248.  253.