rlagdplyrclickstream

Using lag in mutate() for rolling values forward for the created column


I am trying to specify sessions in a click stream data. I group rows based on months and userId and try to create another variable session, that looks at diff_days column, and increase by on if thats > 0.00209 and stays as the previous value otherwise. So basically I am trying to create session variable and use the lag version on it at the same time. The fist row in a group is always session = 1.

So take for example this data is one of the groups from group_by:

ID Month diff_days
2    0     NA
2    0     0.0002
2    0     0.001
2    0     0.01
2    0     0.00034
2    0     0.1
2    0     0.3
2    0     0.00005

and I want to create session variable within each group like this:

ID Month diff_days session
2   0    NA        1
2   0    0.0002    1
2   0    0.001     1    
2   0    0.01      2
2   0    0.00034   2
2   0    0.1       3
2   0    0.3       4
2   0    0.00005   4

The code that I am using and not giving the right answer:

data <- data %>% group_by(ID, Month)
%>% mutate(session =  ifelse(row_number() == 1, 1 ,
ifelse(diff_days < 0.0209, lag(session) , lag(session) + 1))) %>% ungroup()

I have been struggling with this for quite some time so any help would be greatly appreciated.

Thanks!


Solution

  • We can use cumsum on the logical vector after grouping by 'ID', 'Month'. Create a logical vector diff_days[-1] >= 0.00209 (removed the first observation which is NA and appended TRUE as the first one. Then, get the cumulative sum, so that for every TRUE value, it gets added 1.

    data %>% 
       group_by(ID, Month) %>%
       mutate(session = cumsum(c(TRUE, diff_days[-1] >= 0.00209)))
    # A tibble: 8 x 4
    # Groups:   ID, Month [1]
    #     ID Month diff_days session
    #  <int> <int>     <dbl>   <int>
    #1     2     0  NA             1
    #2     2     0   0.0002        1
    #3     2     0   0.001         1
    #4     2     0   0.01          2
    #5     2     0   0.00034       2
    #6     2     0   0.1           3
    #7     2     0   0.3           4
    #8     2     0   0.00005       4