rclickstreamjourney

Populating one column based on two columns in R


I have the below dataset, and I am trying to create a more meaningful path.

Row# Session Click Page
1 123 Enter Pg1
2 123 phpbutton Pg1
3 123 Enter Pg2
4 123 Enter Pg3
5 123 Form1 Pg3
6 123 Form2 Pg3
7 123 Form1 Pg3
8 123 Form1 Pg3
9 123 abcbutton Pg3
10 123 Enter Pg1
11 123 xyzselect Pg1
12 123 Enter Pg4
13 123 Enter Pg3
14 123 Back Pg3
15 123 Enter Pg1

I would like the outcome to look this:

Session Activity
123 Pg1
123 phpbutton
123 Pg2
123 Pg3
123 Form1
123 Form2
123 Form1
123 abcbutton
123 Pg1
123 xyzselect
123 Pg4
123 Pg3
123 Back
123 Pg1

If the Click column has Enter, then the Activity column should show the Page. But, if the subsequent page is equal to the previous page, then the Activity column should show the value from the Click column. For instance, row# 1 and 2 have the same Page numbers, so I would like the Activity column to show, Pg1, then, phpbutton. But, if the Click column has two or more subsequent same values, as seen in Row# 7 and 8, I would like the Activity column to show just one entry of Form 1.

Thanks a lot.


Solution

  • Try this

    df |> group_by(Session) |> 
    mutate(Activity = case_when(Click == "Enter" ~ Page , 
    lag(Page) == Page ~ Click)) |> select(Session , Activity)