rdatetimeunixtimestampclickstream

Calculating Session Duration in R


I have a dataset with a session id, user id, TimeStamp in UNIX (which I converted using lubridate), and the converted TimeStamp column.

Session User ts_UNIX TimeStamp
123 345 UNIX Timestamp 14-06-2022 17:44:32
123 345 UNIX Timestamp 14-06-2022 17:44:33
123 345 UNIX Timestamp 14-06-2022 17:44:37
124 346 UNIX Timestamp 14-06-2022 15:50:10
124 346 UNIX Timestamp 14-06-2022 15:51:01
124 346 UNIX Timestamp 14-06-2022 16:30:00
125 345 UNIX Timestamp 14-06-2022 23:55:30
125 345 UNIX Timestamp 14-06-2022 23:58:50
125 345 UNIX Timestamp 14-06-2022 23:59:45
125 345 UNIX Timestamp 15-06-2022 00:00:32
125 345 UNIX Timestamp 15-06-2022 00:00:59

I would like to add another column called session_duration (in seconds) which is the difference between the max_time and min_time grouped by Session and User. For instance, for session # 123 and user 345, the session duration is [14-06-2022 17:44:37] - [14-06-2022 17:44:32] which is 5 seconds.

Session User ts_UNIX TimeStamp session_duration (seconds)
123 345 UNIX Timestamp 14-06-2022 17:44:32 5
123 345 UNIX Timestamp 14-06-2022 17:44:33 5
123 345 UNIX Timestamp 14-06-2022 17:44:37 5
124 346 UNIX Timestamp 14-06-2022 15:50:10 2390
124 346 UNIX Timestamp 14-06-2022 15:51:01 2390
124 346 UNIX Timestamp 14-06-2022 16:30:00 2390
125 345 UNIX Timestamp 14-06-2022 23:55:30 329
125 345 UNIX Timestamp 14-06-2022 23:58:50 329
125 345 UNIX Timestamp 14-06-2022 23:59:45 329
125 345 UNIX Timestamp 15-06-2022 00:00:32 329
125 345 UNIX Timestamp 15-06-2022 00:00:59 329

This is what my current code looks like. The timestamp has successfully converted, but I am facing an issue with the session duration column.

library(tidyverse)
library(lubridate)
df <- df %>%
  mutate(timestamp = as_datetime(ts_unix/1000)) %>%
  group_by (session, user, timestamp) %>%
  mutate(session_duration = difftime (max(timestamp), min(timestamp), units = "secs"))

Can someone please help me figure out the session_duration column? Thank you.


Solution

  • library(tidyverse)
    library(lubridate)
    
    df %>% 
      group_by(Session, User) %>% 
      mutate(session_duration = max(TimeStamp) - min(TimeStamp))
    
    # A tibble: 11 × 5
    # Groups:   Session, User [3]
       Session  User ts_UNIX        TimeStamp           session_duration
         <dbl> <dbl> <chr>          <dttm>              <drtn>          
     1     123   345 UNIX Timestamp 2022-06-14 17:44:32    5 secs       
     2     123   345 UNIX Timestamp 2022-06-14 17:44:33    5 secs       
     3     123   345 UNIX Timestamp 2022-06-14 17:44:37    5 secs       
     4     124   346 UNIX Timestamp 2022-06-14 15:50:10 2390 secs       
     5     124   346 UNIX Timestamp 2022-06-14 15:51:01 2390 secs       
     6     124   346 UNIX Timestamp 2022-06-14 16:30:00 2390 secs       
     7     125   345 UNIX Timestamp 2022-06-14 23:55:30  329 secs       
     8     125   345 UNIX Timestamp 2022-06-14 23:58:50  329 secs       
     9     125   345 UNIX Timestamp 2022-06-14 23:59:45  329 secs       
    10     125   345 UNIX Timestamp 2022-06-15 00:00:32  329 secs       
    11     125   345 UNIX Timestamp 2022-06-15 00:00:59  329 secs