I am currently working on an application where I have a dataframe that looks like this:
Database UserId Hour Date 01 18 01.01.2016 01 18 01.01.2016 01 14 02.01.2016 01 14 02.01.2016 02 21 02.01.2016 02 08 05.01.2016 02 08 05.01.2016 03 23 05.01.2016
Each line represents a session.
I need to determine whether the time of the first session of a user has an impact on the number of sessions this user is going to have.
I have tried the command summaryBy
:
library(doBy)
first_hour <- summaryBy(UserId + Hour + Date ~ UserId,
FUN=c(head, length, unique), database)
But it doesn't give me the correct result.
My goal here is to determine the Hour
of the first session a user takes, determine how many sessions and how many different session dates a user has.
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'UserId', we order
the 'Date', get the first
'Hour', total number of sessions (.N
) and the number of unique
Date elements (uniqueN(Date)
).
library(data.table)
setDT(df1)[order(UserId, as.Date(Date, "%m.%d.%Y")),.(Hour = Hour[1L],
Sessions = .N, DifferSessionDate = uniqueN(Date)) , by = UserId]
# UserId Hour Sessions DifferSessionDate
#1: 1 18 4 2
#2: 2 21 3 2
#3: 3 23 1 1