I have a database that looks like this:
userId SessionId Screen Platform Version
01 1 first IOS 1.0.1
01 1 main IOS 1.0.1
01 2 first IOS 1.0.1
01 3 first IOS 1.0.1
01 3 main IOS 1.0.1
01 3 detail IOS 1.0.1
02 1 first Android 1.0.2
Basically what I intend to do is to determine if a "path" (different screens) leads to a better retention or not. I would like to re-organize each sessionId in one column. Ideal database would look like this:
userId SessionId Path Retention
01 1 first;main 3
01 2 first 3
01 3 first;main;detail 3
02 1 first 1
Here is the variable Retention
would be equal to the maximum SessionId
.
A possible solution in base R:
d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))
which gives:
> d2
userId SessionId Screen retention
1 01 1 first, main 3
2 02 1 first 1
3 01 2 first 3
4 01 3 first, main, detail 3
An alternative using dplyr
:
library(dplyr)
d %>%
group_by(userId, SessionId) %>%
summarise(Screen = toString(Screen)) %>%
group_by(userId) %>%
mutate(retention = n())
which gives:
userId SessionId Screen retention
<chr> <int> <chr> <int>
1 01 1 first, main 3
2 01 2 first 3
3 01 3 first, main, detail 3
4 02 1 first 1