I have the below dataset, and I am trying to create a more meaningful path.
Row# | Session | Click | Page |
---|---|---|---|
1 | 123 | Enter | Pg1 |
2 | 123 | phpbutton | Pg1 |
3 | 123 | Enter | Pg2 |
4 | 123 | Enter | Pg3 |
5 | 123 | Form1 | Pg3 |
6 | 123 | Form2 | Pg3 |
7 | 123 | Form1 | Pg3 |
8 | 123 | Form1 | Pg3 |
9 | 123 | abcbutton | Pg3 |
10 | 123 | Enter | Pg1 |
11 | 123 | xyzselect | Pg1 |
12 | 123 | Enter | Pg4 |
13 | 123 | Enter | Pg3 |
14 | 123 | Back | Pg3 |
15 | 123 | Enter | Pg1 |
I would like the outcome to look this:
Session | Activity |
---|---|
123 | Pg1 |
123 | phpbutton |
123 | Pg2 |
123 | Pg3 |
123 | Form1 |
123 | Form2 |
123 | Form1 |
123 | abcbutton |
123 | Pg1 |
123 | xyzselect |
123 | Pg4 |
123 | Pg3 |
123 | Back |
123 | Pg1 |
If the Click column has Enter, then the Activity column should show the Page. But, if the subsequent page is equal to the previous page, then the Activity column should show the value from the Click column. For instance, row# 1 and 2 have the same Page numbers, so I would like the Activity column to show, Pg1, then, phpbutton. But, if the Click column has two or more subsequent same values, as seen in Row# 7 and 8, I would like the Activity column to show just one entry of Form 1.
Thanks a lot.
Try this
df |> group_by(Session) |>
mutate(Activity = case_when(Click == "Enter" ~ Page ,
lag(Page) == Page ~ Click)) |> select(Session , Activity)